Factor Analysis of Acoustic Features for Streamed Hidden Markov Modeling
description
Transcript of Factor Analysis of Acoustic Features for Streamed Hidden Markov Modeling
Factor Analysis of Acoustic Features for Factor Analysis of Acoustic Features for Streamed Hidden Markov ModelingStreamed Hidden Markov Modeling
Chuan-Wei Ting
Department of Computer Science and Information Engineering,
National Cheng Kung University, Tainan, Taiwan
2
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future Works
3
Outline
• IntroductionIntroduction• Stochastic modeling
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future Works
4
Introduction
• The objective of constructing acoustic model is to capture the characteristics of speech signal.
• Stochastic modeling
• Hidden Markov model (HMM)
• Multi-Stream HMM
• Factorial HMM
5
Hidden Markov Model
• Topology of HMM
• Constraints
• All features are “tied” together
• Topology
• Transition moment
• Independent assumption
1ts 1tsts
6
Multi-Stream HMM
• Topology of Multi-stream HMM
J
j
M
mjm
mj
mj
J
jjj EpppMp
1 11
)|()|()|()|( YYY
)(mj )(
1mj
)(1Mj
)1(1j
)(Mj
)1(j
7
Simplification of Multi-Stream HMM
• Streams are assumed to be statistical independent
• Weighted log-likelihood approach
J
j
M
m
mj
mj MpMp
1 1
)|(log)|(log YY
J
j
M
m
mj
mj
mj MpMp
1 1
)|(log)|(log YY
8
Factorial HMM
• Topology of FHMM
)2(1ts
)1(1ts
)(1Mts
1ty ty 1ty
)(Mts
)(1Mts
)2(ts
)2(1ts
)1(ts
)1(1ts
9
Outline
• Introduction
• Cepstral Factor AnalysisCepstral Factor Analysis• Features analysis
• Factor analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future Works
10
Cepstral Factor Analysis
• Feature analysis
• Dynamics of different features
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
-15
-10
-5
0
5
10
15
Time (sec)
MF
CC
13th MFCC
1st MFCC
4th MFCC
• Correlations
11
Factor Analysis
• Discover the correlations inherent in observation data.
• Applications
• Data compression
• Signal processing
• Acoustic modeling
12
Mathematical Definition of FA
• FA conducts data analysis of the multivariate observations using the common factors and the specific factors.
• For a dimensional feature vector , the general form of FA model is given by
D TDyy ],[ 1 y
εfWy f
common factor1M factor loading matrix
MD
specific factor1D
),0(~ IN
),0(~ ψN
13
Principal Component Solution
• Find an estimator that will approximate the fundamental expression
• Decompose covariance matrix of observation
• FA parameters can be estimated by
fW
ψWWyy TTER ffy ][
TTTR rrrf2/1
f2/1
ffy VVVVVV
rWfWcVcVcVyyy rfr21
rrf21
ff21
rf
14
Principal Factor Analysis Solution
• Using an initial estimate (diagonal) and then obtain loading matrix by
• Obtain an estimate of by performing a principal component analysis on .
• This process is continued until the communality
estimates converge.
ψ
TR ffyˆˆˆ WWψ
fW
ψy R
M
mdmd w
1
22 ˆ
15
Maximum Likelihood Solution
• When FA is carried out on the correlation matrix
• Where , ,
, , and is a diagonal matrix.
R
Ddw d
M
mdm ,...,1 ,1
1
2
UWψWψψψ~21212121 R
2121 ΣUUR
N
i
Tiin 1
))((1
1yyyyΣ
212111
21 ,..., KKU dmwW U~
UWψWψψψ~ˆˆ 21
021
021
021
0 R
16
Rotation of Loading Matrix
• Rotate loading matrix by an orthogonal matrix
• Where satisfies
WΓH
Γ
TTTTT WWWWΓΓWΓWΓHH ))((
DihqD
jiji ,,1 ,
1
2
H
D
j
D
i i
ijD
i i
ij Dq
hD
q
h
1
2
1
2
1
22
• Varimax rotation
• Let
• can be obtained by maximizing
17
Effectiveness of Rotation
• Obtain greater discriminability
(a) 1st Factor 2nd Factor (b)1st Rotated
Factor
2nd Rotated
Factor1st MFCC 0.842 0.011 1st MFCC -0.892 -0.0044th MFCC -0.312 -0.724 4th MFCC 0.266 0.79113th MFCC 0.896 0.120 13th MFCC -0.933 -0.135
18
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov ModelFA Streamed Hidden Markov Model• Survey of different HMMs
• FASHMM
• Experiments
• Conclusions & Future Works
19
FA Streamed HMM
• Using FA, the processes of observed features and hidden states are represented by common factors and residual factors.
20
Survey of Different HMMs (FAHMM)
• Covariance matrix modeling
• Full vs. diagonal• Sufficient data problem
• FA representation
1f
11ff
111y
ψWWψWWψψ TTIR
• State/latent representation
• Discrete vs. continuous
21
Survey of Different HMMs (Streamed HMM)
• In standard HMM, the joint probability of observation sequence and state sequence was represented by
• Using FHMM, the state at time was extended to
states, i.e. .
},,,{ 21 TY yyy },,,{ 21 TsssS
T
ttttt spsspspspYSp
21111 )|()|()|()(),( yy
t
M )()()1( ,,,, Mt
mttt ssss
• Likelihood combination• Multi-stream HMM
• FHMM
sub-word level
frame level
22
Likelihood Function of FHMM
• State transition probability
• Likelihood function
M
m
mt
mttt sspssp
1
)(1
)(1 )|()|(
, )()(2
1exp
||)2()|(
1
1
1
2/12/
M
mmt
TM
mmt
Dtt sp
yy
y
common covariance matrix
23
Estimation Approaches for FHMM
• Exact inference
• Expectation maximization (EM) algorithm
• Complexity )( 1MTMKO )( 2MTKO
• Approximations
• Gibbs sampling
• Variational inference
)(TMKO
)( 2TMKO
24
FASHMM
• According to FA method, the common factor are associated with some features, which are highly correlated.
• Correlated features are grouped together in a stream and shared by the same FA parameters.
• Observed feature vector can be represented by
mf
TMfff
M]][[
21rfff
rf
21rWwww
rWfWy
25
Topology of FASHMM
• State transition probability
Mfts 1
11fts
r1ts
1ty ty 1ty
rts
r1ts
Mfts
Mfts 1
1fts
11fts
M
m
ft
fttt
tft
ft
ftt
ft
ft
fttt
mm
MM
sspssp
sssssssspssp
11
r1
r
r1111
r1
)|()|(
),,,,|,,,,()|( 2121
26
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• ExperimentsExperiments• Simulated data setup
• HMM vs. FASHMM
• Recognition results & discussion
• Conclusions & Future Works
27
Experimental Setup
• Simulated data• 4 classes, 5 variables• Training: 100 sentences, 5 “words” per sentence• Testing: 50 utterances, 4 “words” per sentence
• Model structure
• HMM• 7 states each class• Only one Gaussian each state
• FASHMM• 3 states each class• Only one Gaussian each state
28
Class 1
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.9662 0.1598 0.1863 0.0171 0.0775
V2 -0.2655 -0.9526 -0.0807 -0.1246 -0.0046
V3 0.2394 -0.1161 0.9639 0.0108 0.0008
V4 0.9697 0.1644 0.1639 -0.0001 -0.0755
V5 0.0675 0.9565 -0.2565 -0.1212 -0.0045
-20
-15
-10
-5
0
5
10
15
20
25
30
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
29
Class 2
-10
-5
0
5
10
15
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.1317 0.9733 -0.1647 -0.0908 0.0008
V2 -0.2007 -0.0951 0.9750 0.0041 -0.0001
V3 -0.9818 -0.1093 0.1515 0.0045 -0.0339
V4 -0.9826 -0.1061 0.1486 -0.0005 0.0337
V5 0.0827 0.9931 0.0077 0.0823 -0.0006
-25
-20
-15
-10
-5
0
5
10
15
20
25
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
30
Class 3
-15
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.0324 0.9939 -0.0704 -0.0788 0.0004
V2 -0.1435 -0.1093 0.9836 -0.0004 -0.0003
V3 0.9913 0.0285 -0.1243 -0.0013 0.0321
V4 0.9955 0.0228 -0.0867 0.0006 -0.0314
V5 0.0186 0.9926 -0.0903 0.0792 -0.0002
-30
-20
-10
0
10
20
30
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
31
Class 4
-20
-15
-10
-5
0
5
10
15
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.1634 -0.9746 -0.1532 -0.0040 0.0002
V2 0.1133 0.1475 0.9826 0.0013 -0.0001
V3 -0.9876 0.0482 -0.1109 -0.0950 0.0313
V4 -0.9701 0.2110 -0.0517 0.1075 0.0161
V5 0.9887 -0.1165 0.0829 -0.0003 0.0456
-25
-20
-15
-10
-5
0
5
10
15
20
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
32
HMM vs. FASHMM
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
4數列
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
2數列
5數列
HMMHMMFASHMMFASHMM
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
3數列
33
Recognition Results
HMM FASHMM# State per HMM 7 3 ( x4 )
Recognition Accuracy 100% 100%
34
Discussion
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
35
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future WorksConclusions & Future Works
36
Conclusions
• We have presented the FA approach
• Extract the common factor and the residual factors in acoustic features
• Separate the Markov chains for these factors.
• Represent the sophisticated dynamics in stochastic process of speech signal.
• A new topology of FA streamed HMM was proposed.
37
Future Works
• More acoustic features
• Model selection• Streams• States• Mixtures
• Large vocabulary continuous speech recognition (LVCSR) task