Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing...
-
Upload
ursula-fox -
Category
Documents
-
view
219 -
download
2
Transcript of Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing...
![Page 1: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/1.jpg)
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech
RecognitionBing Zhang and Spyros Matsoukas,
BBN Technologies, 50 Moulton St. Cambridge
Reporter : Chang Chih Hao
![Page 2: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/2.jpg)
Introduction
• LDA and HLDA– Better classification accuracy
– some common Limitations• None of them assumes any prior knowledge of confusable hypotheses
• Their objective functions do not directly relate to the word error rate (WER)
• Minimum Phoneme Error– Minimize phoneme errors in lattice-based training frameworks
– Since this criterion is closely related to WER, MPE_HLDA tends to be more robust than other projection methods, which makes it potentially better suited for a wider variety of features.
![Page 3: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/3.jpg)
MPE Objection Function
• MPE-HLDA model
• MPE-HLDA aims at minimizing expected number of phoneme errors introduced by the MPE-HLDA model in a given hypothesis lattice, or equivalently maximizing the function
m m
Tm m
t t
A
C diag A A
o Ao
, | (4)
is the total number of training utterances,
is the sequence of p-dimensional observation vectors in utterance r,
is the "raw accuracy" score of wor
r
R
MPE r r rr w
r
r
F O P w O w
R
O
w
d hypothesis .rw
,m mC
![Page 4: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/4.jpg)
MPE Objection Function
•
| is the posterior probability of hypothesis in the lattice
| |
|
is the language model probability of hypothesis ,
k : in order
r
r r r
k
r r r
r r k
r r rw
r r
P w O w
P O w P wP w O
P O w P w
P w w
to reduce acoustic scores dynamic range, thereby avoiding
the concentration of all posterior mass in the top-1 hypothesis of the lattice.
![Page 5: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/5.jpg)
MPE Objection Function
• It can be shown that the derivative of (4) with respect to A is
, log | ,, (6)
, | ,
is the MPE score of utterance r (average accuracy over all hypotheses),
is the average accuracy ove
r
RMPE qr r
rr q
r r qr r
r
F O P O q rk D q r
A A
where
D q r P q O r q r
r
q
rr all hypotheses that contain arc q .
![Page 6: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/6.jpg)
MPE Objection Function
•
•
log | , log | ,
and are the begin and end time of are ,
denotes the posterior probability of Gaussian m in arc at time t.
qr
qr
qr
r r
qr
Eqr r tm
t S m
q q r
mr
P O q r P o mt
A A
S E q
t q
1 1 1
1 1 1 1
log | ,t m mm m t p m m t
T T
m m t m t m m m p m m t m t m
Tmt t m t m
Tmt t m t m
P o mC C P I A C R
A
C C diag o o A C I A C o o
where
P diag o o
R o o
![Page 7: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/7.jpg)
MPE Objection Function
• Therefore, Eq.(6) can be rewritten as
1 1
1
,
,
,
,
qr
r
r qr
qr
r
r qr
qr
r
r qr
MPE
m m m m p mm
Em
m r qr q t S
Em m
m r q tr q t S
Em m
m r q tm r q t S
F Ok C C g I A kJ
A
where
D q r t
g D q r t P
J C D q r t R
39*39
39*162
![Page 8: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/8.jpg)
MPE-HLDA Implementation
• In theory, the derivative of the MPE-HLDA objective function can be computed based on Eq.(12), via s single forward-backward pass over the training lattices. In practice, however, it is not possible to fit all the full covariance matrices in memory.
• Two steps– First, run a forward-backward pass over the training lattices to acumulate
– Second, uses these statistics together with the full covariance matrices to synthesize the derivative.
• The Paper used gradient descent in updating the projection matrix.
![Page 9: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/9.jpg)
MPE-HLDA Implementation
![Page 10: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/10.jpg)
Experimental Framework
A Lp*n
n*l
l*1
p*1
Global feature projection
---there is more useful information in longer contexts
---Reduce the computational cost
![Page 11: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/11.jpg)
Experimentation
• Conversational Telephone Speech (CTS)– 2300 hours of training data
• 800 hours : training the initial ML model
• 1500 hours : held-out training data – Lattice generation
– Discriminative training
– MPE-HLDA : only 370 hours
– Testing set• Eval03
• Dev04
![Page 12: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/12.jpg)
Experimentation
• Conversational Telephone Speech (CTS)– Feature
• Frame concatenated PLP cepstra– 15 frames, l = 225, n = 130, p = 60
![Page 13: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/13.jpg)
Experimentation
![Page 14: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/14.jpg)
Experimentation
• Broadcast News (BN)– 600 hours : training the initial model (Hub4 and TDT)
– 330 hours : held-out data
![Page 15: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/15.jpg)
Thanks
![Page 16: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/16.jpg)
![Page 17: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.](https://reader036.fdocuments.in/reader036/viewer/2022062718/56649e7d5503460f94b809c1/html5/thumbnails/17.jpg)