A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department...

13
A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth Ph.D.
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department...

Page 1: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

A Unifying Review of Linear Gaussian ModelsSummary Presentation 2/15/10 – Dae Il Kim

Department of Computer ScienceGraduate StudentAdvisor: Erik Sudderth Ph.D.

Page 2: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Overview

• Introduce the Basic Model• Discrete Time Linear Dynamical System (Kalman Filter)• Some nice properties of Gaussian distributions• Graphical Model: Static Model (Factor Analysis, PCA, SPCA)• Learning & Inference: Static Model• Graphical Model: Gaussian Mixture & Vector Quantization• Learning & Inference: GMMs & Quantization• Graphical Model: Discrete-State Dynamic Model (HMMs)• Independent Component Analysis• Conclusion

Page 3: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

The Basic Model

• Basic Model: Discrete Time Linear Dynamical System (Kalman Filter)

Variations of this model produce:Factor AnalysisPrincipal Component AnalysisMixtures of GaussiansVector QuantizationIndependent Component AnalysisHidden Markov Models

),0(~ QNw ),0(~ RNv

Additive Gaussian Noise

wAxwAxx tttt 1

vCxvCxy tttt

A = k x k state transition matrixC = p x k observation / generative matrix

Generative Model

Page 4: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Nice Properties of Gaussians

• Markov Property

1

1

11111 )|()|()(}),...,{},,...,({

ttt

ttt xyPxxPxPyyxxP

• Inference in these models

}),...,({

}),...,{},,...,({}),...,{|},...,({

1

1111

yyP

yyxxPyyxxP

}),...,{|}({: 1 tt yyxPFiltering}),...,{|}({: 1 yyxPSmoothing t

• Learning via Expectation Maximization (EM)

dXYXPxQQstepEX

kQ

k )|,(log)(maxarg: 1

dXYXPYXPstepMX

kk )|,(log),|(maxarg: 1

1|),()|( 1

txttt QAxNxxPtyttt RCxNxyP |),()|(

• Conditional Independence

Page 5: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Graphical Model for Static Models

Factor Analysis: Q = I & R is diagonalSPCA: Q = I & R = αIPCA: Q = I & R = lime0eI

wxA 0

vCxy

),0(~ QNw

),0(~ RNv

Generative Model

Additive Gaussian Noise

Page 6: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Example of the generative process for PCA

Z = latent variableX = observed variable

1-dimensional latent space 2-dimensional observation space

Bishop (2006)

Marginal distribution for p(x)

Page 7: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Learning & Inference: Static ModelsAnalytically integrating over the joint, we obtain the marginal distribution of y.

),0(~ RCQCNy T

Note: Filtering and Smoothing reduce to the same problem in the static model since the time dependence is gone. We want to find P(x.|y.) over a single hidden state given the single observation. Inference can be performed simply by linear matrix projection and the result is also Gaussian.

)(

)()|()|(

yp

xpxypyxp

yRCCN

xINyRCxNT |),0(

|),0(|),(

We can calculate our poterior using Bayes rule

xCIyNyxP |),()|(

1)( RCCC TT

Our posterior now becomes another Gaussian

Where beta is equal to:

Page 8: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Graphical Model: Gaussian Mixture Models & Vector Quantization

][0 wWTAxA

vCxy

Generative Model

),(~ QNw

),0(~ RNv

Additive Gaussian Noise

(Winner Takes All - WTA)[x] = new vector with unity in the position of the largest coordinate of the input and zeros in all other positions. [0 0 1 ]

Note: Each state x. is generated independently according to a fixed discrete probability histogram controlled by the mean and covariance of w.

IR 0lim

This model becomes a Vector Quantization model when:

Page 9: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Learning & Inference: GMMs & Quantization

)(

),()|()ˆ(

yP

yexPyexPx j

jj

k

i ij

jj

exPyRCN

exPyRCN

1)(|),(

)(|),(

k

i ij

jj

yRCN

yRCN

1)(|),(

)(|),(

Calculating the posterior responsibility for each cluster is analagous to the E-Step in this model.

Computing the Likelihood for the data is straightforward

),()(1

yexPyP j

k

i

)(|),(1

i

k

ii exPyRCN

k

iii yRCN

1

)(|),(

Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.

Page 10: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Gaussian Mixture Models

)( jj exP

Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.

Marginal Distribution p(y)Joint Distribution p(y,x)

Page 11: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Graphical Model: Discrete-State Dynamic Models

][1 ttt wAxWTAx

vCxvCxy tttt

),(~ QNw

),0(~ RNv

Additive Gaussian Noise

][ wAxWTA t

Generative Model

Page 12: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Independent Component Analysis• ICA can be seen as a linear generative model with non-gaussian priors for the hidden

variables or as a nonlinear generative model with gaussian priors for the hidden variables.

TT yWyfWW )(

The gradient learning rule to increase the likelihood:

dx

xpdxf x )(log)(

)(0 wgxA

vCxy

),0(~ QNw

),0(~ RNv

g(.) is a general nonlinearity that is invertible and differentiable

Generative Model

)))2

(1(4

ln(tan()(w

erfwg

Page 13: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

Conclusion

Many more potential models!