Fast Collapsed Gibbs Sampler for Dirichlet Process...

49
Fast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank 1 Cholesky updates Rajarshi Das [email protected] http://www.cs.cmu.edu/ ~ rajarshd/ c-lab lecture 7th October 2014 template adapted from https://www.sharelatex.com/templates/presentations/radboud-university-beamer-(version-1) Rajarshi Das 1 / 49

Transcript of Fast Collapsed Gibbs Sampler for Dirichlet Process...

Page 1: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Fast Collapsed Gibbs Sampler for DirichletProcess Gaussian Mixture Models using Rank 1

Cholesky updates

Rajarshi Das

[email protected]

http://www.cs.cmu.edu/~rajarshd/

c-lab lecture

7th October 2014template adapted from

https://www.sharelatex.com/templates/presentations/radboud-university-beamer-(version-1)

Rajarshi Das 1 / 49

Page 2: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 2 / 49

Page 3: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 3 / 49

Page 4: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Gaussian Distribution

• 2 parameter distribution (µ and σ)

1

1images from https://en.wikipedia.org/wiki/Normal_distributionRajarshi Das 4 / 49

Page 5: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Multivariate Gaussian Distribution

• Multivariate generalization of Gaussian distribution

• µ is a vector and σ becomes covariance matrix Σ

Rajarshi Das 5 / 49

Page 6: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Normal Inverse Wishart

• 4 parameter family (µ0, κ0,Σ0, ν0)

Σ|Σ0, ν0 ∼ W−1(Σ|Σ0, ν0)

µ|µ0, κ0,Σ ∼ N(µ|µ0,1

κ0Σ)

(µ,Σ) ∼ N(µ|µ0,1

κ0Σ)W−1(Σ|Σ0, ν0)

Rajarshi Das 6 / 49

Page 7: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Multivariate t

• Multivariate generalization of student’s t-distribution

• 3 paramters (µ,Σ, ν)

Rajarshi Das 7 / 49

Page 8: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Dirichlet Distribution

• A dirichlet distribution is a probability distribution overcategorical distributions, parameterized by a vector α of fixedlength K.

• X =< x1, x2, ..., xK >, xi ∈ [0, 1],∑

i xi = 1,is dirichlet distributed with parameter α if,

Rajarshi Das 8 / 49

Page 9: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Dirichlet Distribution

• A dirichlet distribution is a probability distribution overcategorical distributions, parameterized by a vector α of fixedlength K.

• X =< x1, x2, ..., xK >, xi ∈ [0, 1],∑

i xi = 1,is dirichlet distributed with parameter α if,

pdf

p (X |α) =Γ (∑

i αi )∏Kk=1 Γ (αk)

K∏i=1

xαi−1i

Rajarshi Das 9 / 49

Page 10: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 10 / 49

Page 11: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Dirichlet Process

• A Dirichlet Process (DP) is also a distribution over discretedistributions.• This time with no fixed dimensions. In other words the ’K’ is

not fixed (could be infinite!)

• A DP has 2 parameters

1 base distribution G0

2 concentration parameter α

Rajarshi Das 11 / 49

Page 12: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Definition

2

G ∼ DP(G0, α)

if for any partition (A1, .....,AK ) of a measurable space X

(G (A1), ....,G (AK )) ∼ Dirichlet(αG0(A1), ....., αG0(AK ))

2Image from http://www.columbia.edu/~jwp2128/Teaching/E6892/papers/mlss2007.pdf

Rajarshi Das 12 / 49

Page 13: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Example (slide adapted from https://www.cs.cmu.edu/~kbe/dp_tutorial.pdf)

• For example, if the base distribution G0 is a Gaussian.

• The sampled distribution G ∼ DP(G0, α) would look like

Rajarshi Das 13 / 49

Page 14: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Example

Realizations From the Dirichlet Process Look Like a UsedDartboard.3

3https://www.ee.washington.edu/techsite/papers/documents/UWEETR-2010-0006.pdf

Rajarshi Das 14 / 49

Page 15: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Posterior Predictive distribution of DP

G ∼ DP(G0, α)

θ1, ....., θn ∼ G

θn+1|θ1, ....., θn ∼ 1

(α + n)(αG0 + Σn

i=1δθi )

• Also known as Blackwell-MacQueen Urn Scheme

• Has the rich getting richer property

• Chinese Restaurant Process!

Rajarshi Das 15 / 49

Page 16: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Chinese Restaurant Process

4

4Graphic fromhttp://nlp.stanford.edu/~grenager/papers/dp_2005_02_24.pptRajarshi Das 16 / 49

Page 17: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Chinese Restaurant Process

5

5Graphic fromhttp://nlp.stanford.edu/~grenager/papers/dp_2005_02_24.pptRajarshi Das 17 / 49

Page 18: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Chinese Restaurant Process

6

6Graphic fromhttp://nlp.stanford.edu/~grenager/papers/dp_2005_02_24.pptRajarshi Das 18 / 49

Page 19: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Chinese Restaurant Process

7

7Graphic fromhttp://nlp.stanford.edu/~grenager/papers/dp_2005_02_24.pptRajarshi Das 19 / 49

Page 20: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Chinese Restaurant Process

8

8Graphic fromhttp://nlp.stanford.edu/~grenager/papers/dp_2005_02_24.pptRajarshi Das 20 / 49

Page 21: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 21 / 49

Page 22: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Finite Mixture Models (FMM)

• In FMM, we assume data is generated from K distributions.

∀k, θk |λ ∼ G0(λ)

π|α0 ∼ Dir(α0/K , α0/K , ...., α0/K )

zi |π ∼ Discrete(π)

xi |zi , {θk} ∼ F (θzi )

Rajarshi Das 22 / 49

Page 23: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Infinite Mixture Models (IMM)

• In Infinite mixture models, we don’t know how many mixturesgenerated the data, so we donot fix a value of K and let thedata decide!

Rajarshi Das 23 / 49

Page 24: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Dirichlet Process Mixture Models (DPMM)

G ∼ DP(G0, α)

∀i , φi ∼ G

∀i , xi ∼ F (φi )

Rajarshi Das 24 / 49

Page 25: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

DPMM and IMM are the same!

• Turns out that IMM are same as DPGMM when K →∞

P(zi = k|z−i ) = {ni,k+

αK

n−1+α if k is seen beforeα

n−1+α if k is new

Rajarshi Das 25 / 49

Page 26: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 26 / 49

Page 27: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

DPGaussianMM

• In DPGMM, each table is endowed with a Gaussiandistribution.

G ∼ DP(G0, α)

∀i , φi ∼ G

∀i , xi ∼ F (φi )

• F is Gaussian distribution.

• if G0 is a conjugate distribution to F ,then inference via Gibbs samplingbecomes easya.• if covariance is fixed, then G0 is Normal• else if both mean and covariance are

unknown, then G0 is Normal InverseGamma (univariate) or Normal InverseWishart (multivariate)

aAlgorithm 1,2,3 of Neal(2000)

Rajarshi Das 27 / 49

Page 28: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 28 / 49

Page 29: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Posterior Inference for DPGMM

• So I believe, that the data has been generated by a DirichletProcess Mixture Model. Now from the observed data we wantto infer the parameters of the distributions which generatedthe data. (a.k.a Bayesian Inference)

• What are the params of my Posterior distribution?• The mean and variances associated with each table.

• we don’t know a priori how many of them are there, btw!

• The table assignment of each data point (customer).• p({z}, {µ,Σ}| Data, Hyperparams) = p({µ,Σ}|Hyperparams)

* p({z}|Everything)• If the prior of the table params is conjugate to the likelihood,

then we can integrate out {µ,Σ} – collapsing!• p({z}| Data, Hyperparams)

Rajarshi Das 29 / 49

Page 30: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

• Exact computation of posterior is intractable, however we canuse MCMC techniques to sample from posterior distribution.

• Collapsed Gibbs Sampler

p(zi = k|z−i ,X , α, λ) = p(zi = k |z−i , α)p(X |z , λ)

• p(zi = k|z−i , α) is well defined by CRP.

• p(X |z , λ) is the data likelihood. Essentially reduces tocomputing p(xi |x−i , z , λ)

• After some math, p(xi |x−i , z , λ) reduces to

tvN−1−D+1

(xi |µN−1,

ΣN−1(kN−1 + 1)

kN−1(vN−1 − D + 1)

)

Rajarshi Das 30 / 49

Page 31: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Gibbs sampler for Inference

• Now that we have analytical form of everything, the Gibbssampling algorithm becomes

• Randomly initialize customer to tables. Calculate the tableparams. (µ and Σ)

• For each customer• Remove it from the current table. Update the parameters of

the table.• for each table 1...K

• Compute prior prob for the customer to sit in that table.• Compute posterior predictive likelihood to sit in the table.• Multiply them to get p(zi = k|z−i ,X , α, λ)

• Normalize the vector of probabilities and sample for a tablenumber. Update the parameters of the table.

Rajarshi Das 31 / 49

Page 32: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Updating the table parameters

• Every table has 2 parameters - a µ and Σ matrix (which define agaussian distribution)

• For conjugacy we have put a NIW(µ0,Σ0, κ0, ν0) prior on them.

• Since Gaussian and NIW form a conjugate prior, the posteriordistribution of µ and Σ also forms a NIW distribution with thefollowing parameters

µn =κ0µ0 + nx̄

κ0 + n

Σn = Σ0 +N∑i=1

(xi − x̄)(xi − x̄)T +κ0n

κ0 + n(x̄ − µ0)(x̄ − µ0)T

κn = κ0 + n

νn = ν0 + n

Rajarshi Das 32 / 49

Page 33: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Runtime Analysis

• Lets look at the following code snippet from the Gibbssampler algorithm• for each table 1...K

• Compute prior prob for the customer to sit in that table.

• Compute posterior predictive likelihood to sit in the table.(this is a multivariate t-distribution)

• Computing the density under t-distribution requires computingthe determinant and inverse of a covariance matrix.• Both of these operations are very expensive. O(D3)

operations, for a DXD matrix• Therefore the loop takes O(KD3) operations.

Rajarshi Das 33 / 49

Page 34: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Runtime Analysis -II

• Randomly initialize customer to tables. Calculate the tableparams. (µ and Σ)• for µ - O(D) - for each table• for Σ - O(D2) - for each table• Also for each Σ

• Σ−1 and |Σ| - O(D3)

• For the number of iterations you want to run.• For each customer

• Remove it from the current table. Update the parameters ofthe table. - O(D) for µ and O(D2) for Σ

• for each table 1...K• Compute prior prob for the customer to at table. - O(1)• Compute posterior predictive likelihood to sit in the table.

- O(D2)• Multiply them to get p(zi = k |z−i ,X , α, λ) - O(1)

• Normalize the vector of probabilities and sample for a tablenumber. Update the parameters of the table. -O(K ) + O(D) + O(D2) + O(D3)

Rajarshi Das 34 / 49

Page 35: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 35 / 49

Page 36: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Linear Algebra Crash Course

• Cholesky decomposition: Decomposition of a symmetricpositive definite matrix into the product of a lower triangularmatrix and its conjugate transpose.

A = LLT

• Computing the cholesky decomposition takes O(D3) time!.

Rajarshi Das 36 / 49

Page 37: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Cholesky updates

• Suppose A is a positive definite matrix with L as its choleskydecomposition.

• Now if we obtain A′ from A by an update of the form

A′ = A + xxT

• then the cholesky decomposition L′ of A′ can be obtained byan update operation on L. (Rank 1 update)

• Similarly if we have A = A′ - xxT , then we can perform aRank1 downdate to get L from L′

Rajarshi Das 37 / 49

Page 38: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Cholesky updates

9

• This algorithm is O(D2)!

9Image from http://en.wikipedia.org/wiki/Cholesky_decompositionRajarshi Das 38 / 49

Page 39: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Nice properties

• |A| can be computed from L by

log(|A|) = 2 ∗D∑i=1

log(L(i , i))

• Now lets try to compute bTA−1b

bTA−1b = bT (LLT )−1b

= bT (L−1)TL−1b

= (L−1b)T (L−1b)

• Therefore compute (L−1b) and multiply its transpose withitself

Rajarshi Das 39 / 49

Page 40: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

(L−1b)

• (L−1b) is the solution of

Lx = b

• Remember L is a lower triangular matrix, therefore the aboveequation can be solved very efficiently using forwardsubstitution!

Rajarshi Das 40 / 49

Page 41: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 41 / 49

Page 42: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Putting back in perspective

• Recall the posterior update equation for covariance matrix

Σn = Σ0 +N∑i=1

(xi − x̄)(xi − x̄)T +κ0n

κ0 + n(x̄ − µ0)(x̄ − µ0)T

= Σ0 +N∑i=1

xixTi − (κ0 + n)µnµ

Tn + κ0µ0µ

T0

• Now we can write the above definition recursively as

Σn = Σn−1 + xnxTn − (κ0 + n)µnµ

Tn + (κ0 + n − 1)µn−1µ

Tn−1

= Σn−1 +κ0 + n

κ0 + n − 1(xn − µn)(xn − µn)T

• Looks similar to A′ = A + xxT ?

Rajarshi Das 42 / 49

Page 43: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Putting back in perspective

• Recall pdf of multi-variate t-distribution

• If we know the cholesky decomposition of Σ, then we cancompute (x − µ)TΣ−1(x − µ) from L using similar trickshown before for computing bTL−1b

Rajarshi Das 43 / 49

Page 44: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Runtime Analysis -II

• Randomly initialize customer to tables. Calculate the tableparams. (µ and Σ)• for µ - O(D) - for each table• for Σ - (do not need to compute Σ)• Also for each Σ

• Σ−1 and |Σ| - O(D2)

• For the number of iterations you want to run.• For each customer

• Remove it from the current table. Update the parameters ofthe table. - O(D) for µ and O(D2) for Σ

• for each table 1...K• Compute prior prob for the customer to at table. - O(1)• Compute posterior predictive likelihood to sit in the table.

- O(D2)• Multiply them to get p(zi = k |z−i ,X , α, λ) - O(1)

• Normalize the vector of probabilities and sample for a tablenumber. Update the parameters of the table. - O(K ) + O(D)+ O(D2)

Rajarshi Das 44 / 49

Page 45: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Overview

1 Distributions of interest

2 Dirichlet Process

3 Finite, Infinite and Dirichlet Process Mixture Models

4 Dirichlet Process Gaussian Mixture Models

5 Gibbs sampler for DPGMM

6 Linear Algebra Crash Course

7 Fast Gibbs Sampler for DPGMM

8 Results

Rajarshi Das 45 / 49

Page 46: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Speed Test 1

• Just 100 word vectors each of 100 dimensions.

• Both sampler run for 10,20,50,100,500,1000,10000 iterations.

Rajarshi Das 46 / 49

Page 47: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Caveats

• All of these is possible if the prior covariance Σ0 is positivedefinite.

• In practice, we often tend to put Identity as Σ0, which ispositive definite.

Rajarshi Das 47 / 49

Page 48: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Code

• Java Implementation available athttps://github.com/rajarshd/DPGMM

• Plan to implement in C++ too.

Rajarshi Das 48 / 49

Page 49: Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Acknowledgements

• Thanks to my friend Manzil Zaheer for introducing me toCholesky decomposition and proving that the covarianceupdate was indeed a rank 1 update.

Rajarshi Das 49 / 49