Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1...
Transcript of Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1...
![Page 1: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/1.jpg)
Sequential Data ModelingSequential Data Modelingq gq g
Tomoki Toda 21Graham Neubig 1
Sakriani Sakti 1
1A d H C i i L b
Sakriani Sakti
1 Augmented Human Communication LaboratoryGraduate School of Information Science, NAIST
2 I f ti T h l C t / G d t S h l f I f ti S i2 Information Technology Center / Graduate School of Information ScienceNagoya University
![Page 2: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/2.jpg)
Review: Evaluation/Alignment/TrainingEvaluation
Model λ
Sequential Forward/Backward Likelihood
Model λ
qdata algorithms
( )
z
λzxλx all
)|,()|( ppx
Alignment (Decoding)Model λ
Viterbi algorithm )|,(maxargˆ λzxzz
pSequential data x
State sequence
Training
z
Baum‐Welch (i.e., EM) algorithm )|(maxargˆ λxλ
λp
Sequential data x
Model parameter set
Review: 1
λ
![Page 3: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/3.jpg)
Example of Parameter Estimation起: wake up 寝: sleepTraining data samples with state sequences:
• /s/⇒ state 1:起⇒ state 1:起⇒ state 2:寝• /s/ ⇒ state 1: 起⇒ state 1: 起⇒ state 2: 寝• /s/ ⇒ state 2: 寝⇒ state 2: 起⇒ state 1: 起⇒ state 1: 寝• /s/ ⇒ state 2: 寝⇒ state 2: 起
/s/ * States 1 and 2起 0 起 012
Number of observed samples:
13 2/s/ States 1 and 2 can be a final state.0起: 0
寝: 0起: 0寝: 00
0
112
11
1213
12
32
1 20
001 11
2 2
)(1 起B )(2 起B/s/1 2Maximum likelihood estimates:
2/(1+2)1/(1+2)3/(3+1) 2/(2+3))(
)(
1
1
寝B )()(
2
2
寝B/ /
A2,1A
A
2/(1+2)1/(1+2)
1/(2+1)
3/(3 1)1/(3+1)
( )3/(2+3)
1 21,1A
1,2A
2,2A2/(2+1) 2/(2+1)1/(2+1)
Review: 2
![Page 4: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/4.jpg)
Review: Lower Bound of HMM Likelihood
ULog‐scaled likelihood function for U samples of sequential data
U
u
uuU
u
pp1 all
)()()()1(
)(
|,ln|,,lnz
λzxλxx
λz
λzxzz
,)(
)|,(ln1 all
)(
)()()(
)(
pqU
uu
uuu
u
L Lower bound
)()(
E‐step: calculate posterior probabilities of latent variables (i.e., state sequences)
Lower bound
)(old
)()(old
)()(
old)()()(
)|,()|,(,|ˆ uu
uuuuu
pppq
λzxλzxλxzz
U
)(all uz
M‐step: maximize auxiliary function with respect to model parameters
U
u
uuu
u
pq1 all
new)()()(
oldnew)(
|,ln ˆ,z
λzxzλλQ
Review: 3
![Page 5: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/5.jpg)
Review: E‐Step• Calculate posterior probabilities of latent variables
ˆ)( )()( un
us szqn
Expected # of samples observed in state s at time n in sample u
)()( ss nn
)|(
,|)()(
old)()(
λx
λxuu
uun
szp
szp
1 1 1 1 1
)|()|,(
old)(
old
λxλx
un
pszp
3
2
3
2
3
2
3
2
3
2
',ˆ)1( )()(1
)('
un
un
uss szszqn
Expected # of samples from state s’ at time n – 1 to state s at time n in sample u
)|'(
,|',
,)(
)()()(old
)()()(1
1,
λ
λxuuu
uun
un
nnss
szszp
q
1 1 1 1 1
)'()( '',1 sBAs nnsssn x
)|()|',,(
old)(
old)()(
1)(
λxλx
u
un
un
u
pszszp
1
2
1
2
1
2
1
2
1
2
3 3 3 3 3Review: 4
![Page 6: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/6.jpg)
Review: M‐Step
SS SSAuxiliary function
S
sss
S
s
S
sssss
S
sss BAn
1 "o" all1 1'',',
1old o""ln)o""(lnln)1(, λλQ
)1(
S For each state,
ML estimates
S
s
ss
n
n
)1(
)1(ˆ
0
1,1
old
λλs
s
Q Initial state probability
ss
1)(
ˆ λλs
1, ',old
λλ
S
ssAQ Sss
ssA ','
ˆ Transition
0ˆ',
1'
λλss
s
A
S
sss
ss
1'',
,
probability
0
)""(
)o""(1,"o" all
old
λλ s
B
BQ
)o"(")o"(")o""(ˆ
s
ssB
Output
probability)o""( ˆ λλsB "o" all
Review: 5
![Page 7: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/7.jpg)
Review: Example of E‐Step
起 寝 寝2Forward and backward probabilities Forward
BackwardTime
起 寝 寝
0.8 0.2 0.2
1n 2n 3n Backward
0.060.32
0.0121
0.40.14560.5
0.7
0 3Initial Pseudo final
0.7
0 3 1
1s
0 18 0 1080 20.50.4 0.6
0.3
0.11
0.120.12
10.60.3
0.1
1
12 0.18
0.560.108
10.2
0.3088 0.9 0.92sState
1456.04.0
0.05824 0.0192 0.0120.00840.01792Their products
32.02.07.04.0
1456.04.0
0.0108
0.0036
0.04032
0.00128
0.05824
0.061760.06176 0.1008 0.108
0.09720.06048Review: 6
![Page 8: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/8.jpg)
Review: Example of Posterior ProbabilitiesTime起 寝 寝1n 2n 3n
058240 01920 012012.0
01792.0)1()(1,1 u
12.00084.0)2()(
1,1 u
12.005824.0)1()1(
1 12.0
0192.0)2()1(1
12.0012.0)3()1(
1 1s
0108.0)2()(u04032.0)1()(u12.0
0 08.0)2()(2,1 u
0036.0)2()(u
12.00 03.0)1()(
2,1 u
00128.0)1()(u
06176.0)1()1(2 1008.0)2()1(
2 108.0)3()1(2 2s
12.0)2()(
1,2 u12.0
)1()(1,2 u
12.0)(2 12.0
)(2 12.0)(2
12006048.0)1()(
2,2 u120
0972.0)2()(2,2 u
2s
State 12.0 12.0State
Calculate these posterior probabilities (= expected # of samples)
Review: 7
sequence by sequence
![Page 9: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/9.jpg)
Review: Example of Sufficient Statistics起
寝
1n2Posterior probabilities (= expected # of samples) 寝
寝 2n 3n
/s/12.005824.0)1()1(
1 12.0
06176.0)1()1(2
Posterior probabilities ( expected # of samples)
12.005824.0)1()1(
1 起12.0
06176.0)1()1(2 起
0192.0)2()1( 寝 1008.0)2()1( 寝012.0)3()1( 寝 108.0)3()1( 寝12.0
)2(1 寝12.0
)2(2 寝12.0
)3(1 寝
01792.0)1()(u06048.0)1()( u12.0
04032.0)1()(2,1 u
12.00108.0)2()(
2,1 u
12.0)3(2 寝
12.0)1()(
1,1 u 12.0)1(2,2
12.00084.0)2()(
1,1 u12.0
0972.0)2()(2,2 u
1200036.0)2()(
1,2 u120
00128.0)1()(1,2 u
1 2
12.012.0
Sufficient statistics (= expected # of samples for each parameter)
12.005824.0)1(1 n
06176012.0
02632.01,1
12.005112.0
2,1
004880 1 68012.0
05824.0)(1 起
2088012.0
0312.0)(1 寝
061 6012.0
06176.0)1(2 n12.0
00488.01,2
12.015768.0
2,2 12.0
2088.0)(2 寝12.0
06176.0)(2 起Review: 8
![Page 10: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/10.jpg)
Review: Example of ML EstimatesSufficient statistics (expected # of samples for each parameter)
12.005824.0)1(1 n
12.006176.0)1(2 n
051120
12.005824.0)(1 起
0312.0)(1 寝
/s/起:寝:
起: 寝:
1202088.0)(2 寝12.0
06176.0)(2 起
02632.011
12.005112.0
2,1
004880 15768.0
12.0)(1 寝
1 2
寝: 12.0)(2
12.01,1
12.000488.0
1,2 12.02,2
)1( n ML estimates
49.0)1()1(
)1(ˆ21
11
nn
n
34.0ˆ2,11,1
1,11,1
A
)1(n
66.0ˆ2,11,1
2,12,1
A
51.0
)1()1()1(ˆ
21
22
nn
n
03.0ˆ2,21,2
1,21,2
A 97.0ˆ2,21,2
2,22,2
A
)(起 )(寝65.0)()(
)()(ˆ11
11
寝起
起起
B 35.0
)()()()(ˆ11
11
寝起
寝寝
B
)(起 )(寝23.0)()(
)()(ˆ22
22
寝起
起起
B 77.0
)()()()(ˆ22
22
寝起
寝寝
B
Review: 9
![Page 11: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/11.jpg)
Sequential Data ModelingSequential Data Modelingq gq g
55thth classclass“Continuous Latent Variable Model 1” “Continuous Latent Variable Model 1”
Tomoki TodaInformation Technology Center / Graduate School of Information Science
N U i itNagoya University
![Page 12: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/12.jpg)
Basic Techniques
Discrete latent variables Continuous latent variables
Mixture model (e g GMM) Factor analysis (FA)
Discrete latent variables Continuous latent variables
Mixture model (e.g., GMM) Factor analysis (FA)
1z 2z 3z1z 2z 3z
mod
el
1x 2x 3x1x 2x 3x
Linear dynamical systems (LDS)Markov m
Hidden Markov model (HMM) Linear dynamical systems (LDS)M Hidden Markov model (HMM)
1z 2z 3z1z 2z 3z
1x 2x 3x1x 2x 3x
1
![Page 13: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/13.jpg)
Continuous LatentContinuous LatentContinuous Latent Continuous Latent VariablesVariables
(from(from PCAPCA to FA)to FA)(from (from PCAPCA to FA)to FA)
![Page 14: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/14.jpg)
Example of High Dimensional Data• Example of hand‐written digits
• Each image of 100 x 100 = 10,000 pixels, i.e., represented as 10,000 dimensional vector
Each image is represented as one point in the space.
10,000 dimensional space
However # of the degrees of freedom of variability would be limitedHowever, # of the degrees of freedom of variability would be limited…(e.g., only vertical and horizontal translations and the rotations: 3 degrees)
Can we find a lower dimensional subspace on which the data points live?
2
![Page 15: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/15.jpg)
Extraction of Synthetic Variables • Synthesis of new variables by linearly combining observable variables
e.g., from 2‐dimensional observation data to one dimensional data
on5060di
men
si
2,1,5.0 nnn xxy
150 402n
dd
40 15.0
1, n
xx
40ny20
2040
nx 2,nx
represented by inner product:
1st dimension
nny xw Τrepresented by inner product:
0
1, ,
15.0 n
n
xxwwhere
3
2,1 n
n x
![Page 16: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/16.jpg)
Principal Component Analysis (PCA)• How to extract a synthetic variable that the most effectively
represents observable variables?
• Determine a unit vector by maximizing a variance of synthetic variables Mean vector
Synthetic variable:Mean value
μ
Its variance :
Constraint : i.e., unit vector (length = 1)
4
![Page 17: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/17.jpg)
Eigenvalue Problem• Maximization of variance of synthetic variable
nn
N
n 1
N
nnnN 1
1 ΤμxμxSμ μ
Maximize the following objective function with respect to uMaximize the following objective function with respect to
C t i t
u uF
VarianceConstraint
Lagrange multiplier
Ei l bl F Eigenvalue problem:
Eigenvector
0uu
F
Eigenvalue5
![Page 18: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/18.jpg)
Eigenvector and Eigenvalue• Eigenvalue problem Direction
Variance of synthetic
= Eigenvector
Variance of synthetic variable = Eigenvalue
Variance of synthetic i bl / i
Eigenvalue Eigenvector for the largest eigenvaluerepresents the direction that maximizes
variable w/ eigenvector1u
pthe variance of a synthetic variable.
• Synthetic variable with eigenvector (= principal component)
μxuy Τ Its mean = 0 μxu nny 11,
n
Its mean 0Its variance = Eigenvalue 1
6n
![Page 19: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/19.jpg)
Projection onto Low‐Dimensional Space
21,uu• Extraction of multiple eigenvectors, e.g.,
Orthonormal vectors: 1u2uConstraints
and
• Represent high‐dimensional data w/ low‐dimensional data (i i i l t )
nx ny(i.e., principal components)
1st principal component:Synthetic variable w/ an eigenvector for the largest eigenvalue
μxu
y
n
nn
yΤ
Τ11, n
Synthetic variable w/ an eigenvector for the largest eigenvalue
u
nn
n y Τ22,
n2nd principal component:h bl / f h d l l
1 0
ΛM 0 C i t i
Synthetic variable w/ an eigenvector for the 2nd largest eigenvalue
7
2
1
0 ΛMean vector: 0 Covariance matrix:
![Page 20: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/20.jpg)
Whitening Transformation
0Mean vector:μMean vector: U Τ
nΛCovariance :SCovariance :
x
μxUy nnΤ
n
y
nx
ny
Whiteningnn yΛz 2/1
n nWhitening
UΛ Τ2/1 μxUΛz nnΤ2/1
n
0Mean vector:Linear transform f hit i
nn
nzI0
Covariance :for whitening
8
![Page 21: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/21.jpg)
Continuous LatentContinuous LatentContinuous Latent Continuous Latent VariablesVariables
(from PCA to(from PCA to FAFA))(from PCA to (from PCA to FAFA))
![Page 22: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/22.jpg)
Whitening Process with PCA
Linear transformationHigh‐dimensional space Low‐dimensional space
Linear transformation for whitening
z μxuΤ21
Observation data: nxμMean vector :
Low dimensional data: nzMean vector : 0
1 n
nn z μxu11SCovariance : Covariance :
nx1
n
nz01. Dimension reduction
2 Processing for low dimensional data2. Processing for low‐dimensional datae.g., probability density modeling
Regard low‐dimensional data as observation data Ignore errors caused by linear transformation
i.e., unable to model probability density of the original observation data
9
![Page 23: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/23.jpg)
Basic Idea of Factor Analysis (FA)
Linear transformationHigh‐dimensional space Low‐dimensional space
Linear transformation Low dimensional data: nzMean vector :μux nn z21
11ˆ 0Projected data: nx̂
nn Covariance :
nx̂ 1
tz02. Projection onto
1. Low‐dimensionaldata generation
Observation data: nxsubspace
data generation
x exx ˆ3. Random noise addition
nx nnn exx
10
![Page 24: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/24.jpg)
Comparison between PCA and FA• FA capable of defining p.d.f. of observation data based on inversion
process of whitening transformation
Observation data: Low dimensional data:x z
10
Observation data: nx nzμMean vector :
SCovariance :
Mean vector :Covariance :
Τ1Whitening with PCA
nxS μxu nnz Τ
11
1
ˆnz0
ˆ
nx
Modeled asModeled as
μux nn z11ˆ Factor analysis (FA)
Error: a random variablea random variable
Modeled as a random variableModeled as a random variableModeled as a random variableModeled as a random variable
11
![Page 25: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/25.jpg)
Representation of Observation Data w/ FA• Representation of observation data Loading matrixg
n
n Observation model given the factors n n
ΣμWzxλzx , ;,| nnnnp Ngiven the factors
Observation noiseFactors Observation noise
Σ0eλe , ;| nnp N I0zλz ;|p N
Factors(low dimensional data)
I0zλz ,;| nnp N
012
![Page 26: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/26.jpg)
Marginalization over Latent VariablesIf one sample is generated
If K samples are generated
If an infinite # of samples are generatedgenerated… generated… are generated…
λ|,, )()1( K zpzz λ|)1( zpz 0 0 0
λ|nzp λ|,, nnn zpzz λ|nn zpz |np
Σ
λx ,|)1(
)1( nn zp
N K
k )(1 ΣN
n
zzpzp
p
d||
|
λλx
λx
λx |np
Σμwx , ; )1( nn zN
k
knn z
K 1
)( ,; ΣμwxN nnnn zzpzp d |,| λλx13
![Page 27: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/27.jpg)
Derivation of p.d.f. of Observation Data• Derived by marginalizing the joint p.d.f. over factors, which are
nnnnn ppp zλzλzxλx d|,||regarded as a latent variable
nnnnn ppp zzz |,||
nnnn zI0zΣμWzx d ,;,; NN
ΣWWμx Τ,;nN
Covariance matrix
Expectations:
ΤΤ zI0zμWzμWzΣxx
μμzWzI0zμWzx
nnnnn
d,;
d,;
N
N = Mean vector
ΤΤΤΤΤΤ μzWWzμμμWzzWΣ
zI0zμWzμWzΣxx
nnnn
nnnnnn d,;N
ΤΤ μμWWΣ = Covariance matrix + Squared mean vector14
![Page 28: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/28.jpg)
Comparison between GMM and FAGMM: discrete latent variables FA: continuous latent variables
1z 2z 3z1z 2z 3z
1x 2x 3x1x 2x 3x
nn zpp || λλx
λλx |1|1 ,
M
m mnn zpp
nnn zzp d ,| λx λx ,1| , mnn zp
Prior = discrete distribution Prior = Gaussian distributionPrior discrete distribution Prior Gaussian distribution
15
![Page 29: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/29.jpg)
M d l T i iM d l T i iModel TrainingModel Training(Parameter Optimization)(Parameter Optimization)(Parameter Optimization)(Parameter Optimization)
![Page 30: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/30.jpg)
Maximum Likelihood (ML) Estimation• Log‐scaled likelihood function:
N
nn
N
nnnnpp
11,;lnd |,ln|ln ΣWWμxzλzxλX ΤN
• ML estimates of model parameters: Linear equation!
N
nnN 1
1ˆ xμ 0μ
λX
λλ
ˆ
|ln pMean vector : equation!
L di t i 0λX |ln pNonlinear equations… ˆLoading matrix : 0
WλX
λλ
ˆ
|ln p equations…
λX |ln p
W
ˆ
?
Covariance matrix : 0Σ
λX
λλ
ˆ
|ln p Σ̂ ?
How to determine ML estimates of these parameters?16
![Page 31: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/31.jpg)
Lower Bound of Likelihood Function• Derivation of lower bound of log‐scaled likelihood function
Log‐scaled likelihood function:
zλzxλX d |,ln|ln1
ppN
nnnn
Probability density functionof latent variables
zq
zz
λzxz d |,ln1 q
pqN
nn
n
nnn
J ’ i li
zz
λzxz d|,ln1
1
qpq
qN
nnn
n
n n
Jensen’s inequality
λ
z,
1
qqn n
L
Lower bound:
N
nnn
n qpqq d |,ln, z
zλzxzλL n nq1 z
17
![Page 32: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/32.jpg)
EM Algorithm• Maximization of lower bound (functional of q and function of λ )
N
Maximize lower bound with respect to q:
n
nnn pqpq1
,|||KL|ln, λxzzλXλL
KL divergenceLog‐scaled likelihood
q||pKLMaximize lower bound with respect to λ(={W, Σ}):
λX |ln p λ,qL
N
nnnnn pqq
1
d |,ln, zλzxzλLn 1
Auxiliary function
18
![Page 33: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/33.jpg)
Review: Schematic Image of EM Algorithm
λX |ln p
λX |ln
λX |ln pLog‐scaled likelihood function
λX |ln p )1()1( , iiq λL
)()( , iiq λL λ,)1( iqL
λ)(iλ )1( iλ
λ,)(iqL )(iλ )1( iλ
1 E‐step: determine lower
,q3. E‐step
0. Current model parameter set
2. M‐step: update model parameters based on the lower bound
1. E step: determine lowerbound based on current model parameters
parameter set the lower bound
19
![Page 34: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/34.jpg)
E‐Step: Update q• Set KL divergence to 0 under the fixed model parameters oldλ
old,|ˆ λxzz nnn pq 0,|||KL1
old
N
nnnn pq λxzz
C l l t t i b bilit d it f l t t i bl f h lCalculate posterior probability density of latent variables for each sample
0ˆKL ||pq || pp λzλzx
old,| λxz nnp
d |,|
|,|
oldold
oldold
nnnn
nnn
pppp
zλzλzxλzλzx
old|ln λXp old,ˆ λqL |,|1oldold nnn pp
Zλzλzx
,;,ˆ;1nnnZ
I0zΣμWzx NN
? , ? ;nzN20
![Page 35: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/35.jpg)
Posterior Probability Density Function
const1exp,| )|(1)|(1)|(old
xzn
xznn
xznnnp μΣzzΣzλxz ΤΤ
2
p,| old nnnnnnp μzzzz
)|(1)|()|(1exp xzxzxz μzΣμz Τ* See appendix 1
2
exp nnnn μzΣμz
Posterior probability density function of latent variables:
ppfor derivation
)|()|(old ,;,| xzxz
nnnnp Σμzλxz N
Posterior probability density function of latent variables:
11)|( IWΣWΣ ΤxzCovariance matrix
+1
‐ Sample‐independent
xzxz ΣWΣ 1)|()|( ΤMean vector
‐ Full matrix
nxzxz
n xΣWΣμ 1)|()|( Τn
nxA ‐ Sample‐dependent‐ Linear transformation
n
μxx ˆ nn -n
wheren
21
![Page 36: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/36.jpg)
Schematic Image of E‐Step
)|()|( xx
Posterior probability density funciton of latent variables: )|()|(
old ,;,| xzxznnnnp Σμzλxz N
11 IWΣW Τ μxΣWΣ ˆ1)|( xz Τ CovarianceMean IWΣW μxΣWΣ n
+n1
-
Covariance matrixvector
Observation data samples
)|()|(|| xzxzp Σμzλxz N
Posterior p.d.f. of latent variables
1x )|()|(11old11 ,|,|p Σμzλxz N1
1st data sample
)|()|(|| xzxz Σλ Nx )|()|(22old22 ,|,| xzxzp Σμzλxz N2x2nd data sample
22
![Page 37: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/37.jpg)
M‐Step: Update λ• Maximize auxiliary function with respect to model parameters newλ
N
nnnn pqq d |,lnˆ, zλzxzλL
||pq̂KLn 1
Auxiliary function
N oldnew , λλQ
d |,ln,|1
newold
N
N
nnnnnn pp zλzxλxz new,ˆ λqL
new|ln λXp d ,|ln,|
1newold
N
nnnnnn pp zλzxλxz
d ,;ln,;1
)|()|(
N
nnnn
xzxznn zΣWzxΣμz NN
?23
![Page 38: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/38.jpg)
Expansion of Auxiliary Function
N
nnnxzxz
nn)|()|(
oldnew d ,;ln,;, zΣWzxΣμzλλ NNQ* See appendix 2for more details
nnnnnn
1oldnewQ
N
nn11
21ln
21 xΣxΣ T Expectation of nz n
z
nnn
1 22
nnnxzxz
nn xΣWzzΣμz 1)|()|( d ,; TTN
Expectation of Τnnzz
Τzz
d ,;tr
21 )|()|(1
nnnxzxz
nn zzzΣμzWΣW TT N
n
N
nnN 11 tr21ln
21 TxxΣΣ Summation of T
nnxx
2
n 122
NN11 tr1tr TTTT zzWΣWzxΣW
nn
# of samples
n
nn
nn11
tr2
tr zzWΣWzxΣW
Summation of TzzSummation ofTzx Summation of
nzzSummation of nn zx
24
![Page 39: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/39.jpg)
Sufficient Statistics
Analytical calculation of expectations:
)|()|()|( d ,; xznnn
xzxznnn
μzzΣμzz N TTT )|()|()|()|()|( d; xzxzxzxzxz μμΣzzzΣμzzz N
n
+n
)|()|()|()|()|( d,; nnnnnnnnμμΣzzzΣμzzz N n nn +
Sufficient statistics:
N
NSufficient statistics:
# of samples
N
nnn
1
TT xxxx
N
n 1 nndiagSum of squared samples
N
n 1
N
nn
1
TT zzzzn
Sum of expectations of squared latent variables
N
nnn
1
TT zxzx
N
n 1 nnSum of cross terms
25
n 1
![Page 40: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/40.jpg)
ML Estimates
TxxΣΣλλ 11oldnew tr1ln1, NQAuxiliary function: oldnew 22
,Q
TTTT zzWΣWzxΣW 11 tr1tr
y
zzz2
ML estimate of loading matrix: λλ Linear equation! 0zzWΣzxΣW
λλ
λλ
TT ˆˆˆ, 11
ˆ
oldnewQ
1
Linear equation!
ML i f i iLinear equation!
1ˆ TT zzzxW
1
ML estimate of covariance matrix: 0WzzWWzxxxΣΣ
λλ
TTTTT ˆˆ
21ˆ
21ˆ
21diag,
1oldnew NQ
q
Σ λλ 222ˆ
1
TTT WzxxxΣ ˆdiag1ˆ diag
26
WzxxxΣ diagN
diag
![Page 41: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/41.jpg)
App. 1: Derivation of Posterior p.d.f.nx I0zΣμWzxλx ,;,ˆ;,| old nnnnnzp NN
nnnnnn zzWzμxΣWzμx ΤΤ
21expˆˆ
21exp 1
- 22
const11 11 xΣWzzzWzΣWz ΤΤΤΤΤInside of exp( ) const22
nnnnnn xΣWzzzWzΣWz
const1 11 xΣWzzIWΣWz ΤΤΤΤ
Inside of exp( )
1)|( xzΣ
const2
nnnn xΣWzzIWΣWz
+)|( xzΣ
const21 1)|(1)|(1)|(
nxzxz
nnxz
n xΣWΣΣzzΣz ΤΤΤ
)|( xz1
2
)|( xznμ
const21 )|(1)|(1)|(
xzn
xznn
xzn μΣzzΣz ΤΤ
Appendix: 1
![Page 42: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/42.jpg)
App. 2: Expansion of Auxiliary Function (1)Auxiliary function:
N
N
N
nnnn
xzxznn
1
)|()|(oldnew d ,;ln,;, zΣWzxΣμzλλ NNQ
N
nnn
xzxznn
1
11)|()|(
21ln
21 ,; xΣxΣΣμz TN
nnnnn zzWzΣWxΣWz d tr21 11
TTTTQuadratic form in nz
N
nnn
1
11
21ln
21 xΣxΣ T
n 1
nnnxzxz
nn xΣWzzΣμz 1)|()|( d ,; TTN
d ,;tr
21 )|()|(1
nnnxzxz
nn zzzΣμzWΣW TT NExpectation of nz
Expectation of Τ
nnzzAppendix: 2
![Page 43: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/43.jpg)
App. 2: Expansion of Auxiliary Function (2)
Analytical calculation of expectations:
)|()|()|( d ,; xznnn
xzxznnn
μzzΣμzz N TTT )|()|()|()|()|( d; xzxzxzxzxz μμΣzzzΣμzzz N
n
+n
)|()|()|()|()|( d,; nnnnnnnnμμΣzzzΣμzzz N
N
11 11 T
n nn +
Q d ti f i x
nnn
1
11oldnew 2
1ln21, xΣxΣλλ TQ
1 11 TTTT
Quadratic form in nx
tr
21 11
nnnTTTT zzWΣWxΣWz
N11
nnnN
1
11 tr21ln
21 TxxΣΣ
Summation of Tnnxx
# f l
N
nn
N
nnn
1
1
1
1 tr21tr TTTT zzWΣWzxΣW
# of samples
Summation of n
TzzSummation of Tnn zx
Appendix: 3
![Page 44: Sequential Data Modeling...2016/07/01 · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of](https://reader036.fdocuments.in/reader036/viewer/2022081407/604efe19dd3c1a4dec3d7517/html5/thumbnails/44.jpg)
App. 2: Expansion of Auxiliary Function (3)Sufficient statistics:
N
TT xxxx
N
N n
# of samples
f d l
n
nn1xxxx
NN
TT
n 1 nndiagSum of squared samples
Sum of expectations of
N
TTN
N
n 1
n
n1
TT zzzzn
Sum of expectations of squared latent variables
n
nn1
TT zxzx
N
n 1 nnSum of cross terms
TxxΣΣλλ 11oldnew tr
21ln
21, NQAuxiliary function:
22
TTTT zzWΣWzxΣW 11 tr21 tr 2
Appendix: 4