Factorial Mixture of Gaussians and the Marginal Independence Model

45
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva [email protected] Joint work-in-progress with Zoubin Ghahramani

description

Factorial Mixture of Gaussians and the Marginal Independence Model. Ricardo Silva [email protected] Joint work-in-progress with Zoubin Ghahramani. Goal. To model sparse distributions subject to marginal independence constraints. Why?. X1. X2. Y1. Y2. Y3. Y4. Y5. Why?. X1. X2. - PowerPoint PPT Presentation

Transcript of Factorial Mixture of Gaussians and the Marginal Independence Model

Page 1: Factorial Mixture of Gaussians and the Marginal Independence Model

Factorial Mixture of Gaussiansand the Marginal Independence Model

Ricardo Silva [email protected]

Joint work-in-progress with Zoubin Ghahramani

Page 2: Factorial Mixture of Gaussians and the Marginal Independence Model

Goal

• To model sparse distributions subject to marginal independence constraints

Page 3: Factorial Mixture of Gaussians and the Marginal Independence Model

Why?

X1

Y1 Y2 Y3 Y4 Y5

X2

Page 4: Factorial Mixture of Gaussians and the Marginal Independence Model

Why?

X1

Y1

X2

Y2

X3

Y3

H12 H23

Page 5: Factorial Mixture of Gaussians and the Marginal Independence Model

Why?

Page 6: Factorial Mixture of Gaussians and the Marginal Independence Model

How?

X1

Y1

X2

Y2

X3

Y3

Page 7: Factorial Mixture of Gaussians and the Marginal Independence Model

How?

Page 8: Factorial Mixture of Gaussians and the Marginal Independence Model

Context

• Yi = fi(X, Y) + Ei, where Ei is an error term

• E is not a vector of independent variables• Assumed: sparse structure of marginally

dependent/independent variables• Goal: estimating E-like distributions

Page 9: Factorial Mixture of Gaussians and the Marginal Independence Model

Why not latent variable models?

• Requires further decisions• How many latents? Which children?• Silly when marginal structure is sparse• In the Bayesian case:– Drag down MCMC methods with (sometimes much)

extra autocorrelation– Requires priors over parameters that you didn’t even

care in the first place

• (Note: this talk is not about Bayesian inference)

Page 10: Factorial Mixture of Gaussians and the Marginal Independence Model

Example

Page 11: Factorial Mixture of Gaussians and the Marginal Independence Model

Example

Page 12: Factorial Mixture of Gaussians and the Marginal Independence Model

Bi-directed models: The story so far

• Gaussian models– Maximum likelihood (Drton and Richardson, 2003)– Bayesian inference (Silva and Ghahramani, 2006,

2008)

• Binary models– Maximum likelihood (Drton and Richardson, 2008)

Page 13: Factorial Mixture of Gaussians and the Marginal Independence Model

New model: mixture of Gaussians

• Latent variables: mixture indicators– Assumed #levels is decided somewhere else

• No “real” latent variables

Page 14: Factorial Mixture of Gaussians and the Marginal Independence Model

Caveat emptor

• I think you should buy this, but be warned that speed of computation is not the primary concern of this talk

Page 15: Factorial Mixture of Gaussians and the Marginal Independence Model

Simple?

Y1 Y2 Y3

C

Y1, Y2, Y3 jointly Gaussian with sparse covariance matrix c indexed by C

Page 16: Factorial Mixture of Gaussians and the Marginal Independence Model

Not really

Y1 Y2 Y3

C

Page 17: Factorial Mixture of Gaussians and the Marginal Independence Model

Required: a factorial mixture of Gaussians

c1

c2

c3

Page 18: Factorial Mixture of Gaussians and the Marginal Independence Model

A parameterization under latent variables

Assume Z variables are zero-mean Gaussians, C variables are binary

Page 19: Factorial Mixture of Gaussians and the Marginal Independence Model

A parameterization under latent variables

Page 20: Factorial Mixture of Gaussians and the Marginal Independence Model

Implied indexing

ij should be indexed only by those cliques containing both Yi and Yj

Page 21: Factorial Mixture of Gaussians and the Marginal Independence Model

Implied indexing

i should be indexed only by those cliques containing Yi

Page 22: Factorial Mixture of Gaussians and the Marginal Independence Model

Factorial mixture of Gaussians and the marginal independence model

• The general case for all latent structures

• Constraints: in the indexing

whenever c and c’ agree on clique indicators in the intersection of the cliques with Yi and Yj

Page 23: Factorial Mixture of Gaussians and the Marginal Independence Model

Factorial mixture of Gaussians and the marginal independence model

• The parameter pool (besides mixture probs.):– {c[i]}, the mean vector– {ii

c[i]}, the variance vector

– {ijc[ij]}, the covariance vector

• For Yi and Yj linked in the bi-directed graph (since covariance is zero otherwise): marginal independence constraints

• Given c, we assemble the corresponding mean and covariance

Page 24: Factorial Mixture of Gaussians and the Marginal Independence Model

Size of the parameter space

• Let L[i j] be the size of the largest clique intersection, L[i] largest clique

• Let k be the maximum number of values any mixture indicator can take

• Let p be the number of variables, e the number of edges in bi-directed graph

• Total number of parameters:

O(ekL[i j] + pkL[i])

Page 25: Factorial Mixture of Gaussians and the Marginal Independence Model

Size of the parameter space

• Notice this is not a simple function of sparseness:– Dependence on the number of clique

intersections– Non-sparse models can have few cliques

• In decomposable models, number of clique intersections is given by the branch factor of the junction tree

Page 26: Factorial Mixture of Gaussians and the Marginal Independence Model

Maximum likelihood estimation

• An EM framework

Page 27: Factorial Mixture of Gaussians and the Marginal Independence Model

Maximum likelihood estimation

• An EM framework

Page 28: Factorial Mixture of Gaussians and the Marginal Independence Model

Algorithms

• First, solving the exact problem (exponential in the number of cliques)

• Constraints– Positive definite constraints– Marginal independence constraints

• Gradient-based methods– Moving in all dimensions quite unstable• Violates constraints

– Move over a subset while keeping part fixed

Page 29: Factorial Mixture of Gaussians and the Marginal Independence Model

Iterative conditional fitting: Gaussian case

• See Drton and Richardson (2003)• Choose some Yi• Fix the covariance of Y\i

• Fit the covariance of Yi with Y\i, and its variance

• Marginal independence constraints introduced directly

Page 30: Factorial Mixture of Gaussians and the Marginal Independence Model

12 = 3,12

Iterative conditional fitting: Gaussian case

Y1 Y2 Y3

11 12

12 22

Y1

Y2

Y3

1

2

2

33

Y1

Y2

Y3

23

13

13

23= 31

32= 0

32

1311 + 12 = 023 13 = f( , 12)23

bb

b b

b3bb

b b b b

Page 31: Factorial Mixture of Gaussians and the Marginal Independence Model

Iterative conditional fitting: Gaussian case

Y1 Y2 Y3 Y1

Y2

Y3

3

Y1

Y2

Y3

23

R2.1

Y3 = R2.1 + 3, where R2.1 is the residualof the regression of Y2 on Y1

23

b

b

Page 32: Factorial Mixture of Gaussians and the Marginal Independence Model

How does it change in the mixture of Gaussians case?

Y1 Y2 Y3

C12 C23

Y1 Y2 Y3 Y1 Y2 Y3

Y1 Y2 Y3

C12 C23

Page 33: Factorial Mixture of Gaussians and the Marginal Independence Model

Parameter expansion

• Yi = b1cR1 + b2

cR2 + ... + c

• c does vary over all mixture indicators• That is, we create an exponential number of

parameters– Exponential in the number of cliques

• Where do the constraints go?

Page 34: Factorial Mixture of Gaussians and the Marginal Independence Model

Parameter constraints

• Equality constraints are back

• Similar constraints for the variances

b1cR1j

c + b2c R2j

c + ... + bkc Rkj

c =

b1c’ R1j

c’ + b2c’ R2j

c’ + ... + bkc’ Rkj

c’

Page 35: Factorial Mixture of Gaussians and the Marginal Independence Model

Parameter constraints

• Variances of c , c, have to be positive• Positive definiteness for all c is then

guaranteed (Schur’s complement)

Page 36: Factorial Mixture of Gaussians and the Marginal Independence Model

Constrained EM

• Maximize expected conditional of Yi given everybody else subjected to– An exponential number of constraints– An exponential number of parameters– Box constraints on gamma

• What does this buy us?

Page 37: Factorial Mixture of Gaussians and the Marginal Independence Model

Removing parameters

• Covariance equality constraints are linear

• Even a naive approach can work:– Choose a basis for b (e.g., one bij

c corresponding to each non-zero ij

c[ij])

– Basis is of tractable size (under sparseness)– Rewrite EM function as a function of basis only

b1cR1j

c + b2c R2j

c + ... + bkc Rkj

c =

b1c’ R1j

c’ + b2c’ R2j

c’ + ... + bkc’ Rkj

c’

Page 38: Factorial Mixture of Gaussians and the Marginal Independence Model

Quadratic constraints

• Equality of variances introduce quadratic constraints tying s and bs

• Proposed solution: – fix all ii

c[i] first– fit bijs with such fixed parameters

• Inequality constraints, non-convex optimization– Then fit s given b– Number of parameters back to tractable

• Always an exponential number of constraints• Note: reparameterization also takes an exponential number

of steps

Page 39: Factorial Mixture of Gaussians and the Marginal Independence Model

Relaxed optimization• Optimization still expensive. What to do?• Relaxed approach: fit bs ignoring variance

equalities– Fix , fit b– Quadratic program, linear equality constraints

• I just solve it in closed formula– s end up inconsistent

• Project them back to solution space without changing b – always possible

• May decrease expected log-likelihood– Fit given bs

• Nonlinear programming, trivial constraints

Page 40: Factorial Mixture of Gaussians and the Marginal Independence Model

Recap

• Iterative conditional fitting: maximize expected conditional log-likelihood

• Transform to other parameter space– Exact algorithm: quadratic inequality constraints

“instead” of SD ones– Relaxed algorithm: no constraints• No constraints?

Page 41: Factorial Mixture of Gaussians and the Marginal Independence Model

Approximations

• Taking expectations is expensive what to do?

• Standard approximations use a ``nice’’ ’(c)– E.g., mean-field methods

• Not enough!

Page 42: Factorial Mixture of Gaussians and the Marginal Independence Model

A simple approach

• The Budgeted Variational Approximation• As simple as it gets: maximize a variational

bound forcing most combinations of c to give a zero value to ’(c)– Up to a pre-fixed budget

• How to choose which values?• This guarantees positive-definitess only of

those (c) with non-zero conditionals ’(c)– For predictions, project matrices first into PD cone

Page 43: Factorial Mixture of Gaussians and the Marginal Independence Model

Under construction

• Implementation and evaluation of algorithms– (Lesson #1: never, ever, ever, use MATLAB

quadratic programming methods)

• Control for overfitting– Regularization terms

• The first non-Gaussian directed graphical model fully closed under marginalization?

Page 44: Factorial Mixture of Gaussians and the Marginal Independence Model

Under construction: Bayesian methods

• Prior: product of experts for covariance and variance entries (times a BIG indicator function)

• MCMC method: a M-H proposal based on the relaxed fitting algorithm– Is that going to work well?

• Problem is “doubly-intractable”– Not because of a partition function, but because of

constraints– Are there any analogues to methods such as

Murray/Ghahramani/McKay’s?

Page 45: Factorial Mixture of Gaussians and the Marginal Independence Model

Thank You