Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech...
Philip Jackson and Martin RussellPhilip Jackson and Martin Russell
Electronic Electrical and Computer Engineering
Models of speech Models of speech dynamics in a segmental-dynamics in a segmental-
HMM recognizer using HMM recognizer using intermediate linear intermediate linear
representationsrepresentations
http://web.bham.ac.uk/p.jackson/balthasar/
Conventional model
INTRODUCTIONINTRODUCTION
1
acoustic observations
HMM
acoustic PDF
1 11 1 2 3 42 2222 3 33 4 4 42
Linear-trajectory model
INTRODUCTIONINTRODUCTION
2 3 41
W
acoustic observations
articulatory-to-
intermediate layer
segmental HMM
acoustic PDF
acoustic mapping
Multi-level Segmental HMM
• segmental finite-state process
• intermediate “articulatory” layer– linear trajectories
• mapping required– linear transformation– radial basis function network
INTRODUCTIONINTRODUCTION
Training the model parameters
For optimal least-squares estimates (acoustic domain):
,s
11
)(1
ˆi
i
t
tti t
Tyc
1 2
1
1
1 )(ˆ
i
i
i
i
t
tt
t
tti
tt
tttym
THEORYTHEORY
midpoint
slope
11
)(1
ˆi
i
t
ttki tW
Tyc
THEORYTHEORY
midpoint
slope
For optimal least-squares estimates (articulatory domain):
,s
1 2
1
1
1 )(ˆ
i
i
i
i
t
tt
t
tt k
itt
tttW ym
Training the model parameters
11
)(1
ˆi
i
t
ttikii tDWD
Tyc
1 2
1
1
1 )(ˆ
i
i
i
i
t
tt
t
tt iki
itt
tttDWD ym
THEORYTHEORY
midpoint
slope
For optimal maximum-likelihood estimates (articulatory domain):
,s
Training the model parameters
Tests on MOCHA
• S. British English, at 16kHz (Wrench, 2000)
– MFCC13 acoustic features, incl. zero’th
– articulatory x- & y-coords from 7 EMA coils– PCA9+Lx: first nine articulatory modes plus
the laryngograph log energy
METHODMETHOD
MOCHA baseline performance
53
54
55
56
ID_0 ID_1
Mappings
Acc
ura
cy (
%)
RESULTSRESULTS
• Constant-trajectory SHMM (ID_0)• Linear-trajectory SHMM (ID_1)
Performance across mappings
53
54
55
56
ID_0 A (1) B (2) C (6) D (10) E (10) F (49) ID_1
Mappings
Acc
ura
cy (
%)
RESULTSRESULTS
Phone categorisation
No. Description
A 1 all data
B 2 silence; speech
C 6 linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate
D 10 as (Deng and Ma, 2000):silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop
E 10 discrete articulatory regions
F 49 silence; individual phonesMETHODMETHOD
Tests on TIMIT
• N. American English, at 8kHz
– MFCC13 acoustic features, incl. zero’th
a) F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker
b) F1-3+BE5: five band energies added
c) PFS12: synthesiser control parameters
METHODMETHOD
60
61
62
63
64
65
66
Acc
ura
cy (
%)
ID_0 ID_1
Features
TIMIT baseline performance
• Constant-trajectory SHMM (ID_0)• Linear-trajectory SHMM (ID_1)
RESULTSRESULTS
Performance across feature sets
RESULTSRESULTS
60
61
62
63
64
65
66
Acc
ura
cy (
%)
ID_0 (a) F3 (b) F3+BE5 (c) PFS12 ID_1
Features
Performance across groupings
RESULTSRESULTS
60
61
62
63
64
65
66
Acc
ura
cy (
%)
ID_0 A (1) B (2) C (6) D (10) E (10) F (49) ID_1
Mappings
Results across groupings
RESULTSRESULTS
ID_0
A(1
)
B(2
)
C(6
)
D(1
0)
E(1
0)
F(4
9)
ID_1
60
61
62
63
64
65
66
Acc
ura
cy (
%)
Mappings
(a) F3
(b) F3+BE5
(c) PFS12
Model visualisation
Originalacousticdata
Constant-trajectorymodel
Linear-trajectorymodel (c,F)
DISCUSSIONDISCUSSION
Conclusions• Developed framework for speech
dynamics in an intermediate space• Linear traj. + piecewise linear mapping
bounded by performance of linear traj. in acoustic space
• Near optimal performance achieved– For more than 3 formant parameters– For 6 or more linear mappings
• Formants and articulatory parameters gave qualitatively similar results
• What next?
SUMMARYSUMMARY
• Complete experiments with lang. model• Include segment duration models• Derive pseudo-articulatory
representations by unsupervised (embedded) training
• Implement non-linear mapping (i.e., RBF)
• Further information:– here and now– [email protected]– web.bham.ac.uk/p.jackson/balthasar
SUMMARYSUMMARY
Further work