Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models...

49
Gibbs Sampling and Variational Methods Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 20190403

Transcript of Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models...

Page 1: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Gibbs Sampling and VariationalMethods

Héctor Corrada Bravo

University of Maryland, College Park, USA CMSC 644: 2019­04­03

Page 2: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Blei (2012), Comm. ACM

Mixture ModelsDocuments as mixtures of topics (Hoffman 1999, Blei et al. 2003)

1 / 48

Page 3: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Kovacevic (2014), PLOS One

Mixture ModelsMore applications: genetics, populations as mixture of ancestralpopulations

2 / 48

Page 4: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Glueck, et al. (2017). TVCG

Mixture ModelsMore applications: clinical subtyping

3 / 48

Page 5: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Schulman and Saria (2016). JMLR

Mixture ModelsMore applications: clinical prognosis

4 / 48

Page 6: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

https://radimrehurek.com/gensim/

Mixture Models

Software

5 / 48

Page 7: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

https://mc­stan.org/

Mixture Models

Software

6 / 48

Page 8: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Mixture ModelsWe have a set of documents 

Each document modeled as a bag­of­words (bow) over dictionary  .

: the number of times word   appears in document  .

D

W

xw,d w ∈ W d ∈ D

7 / 48

Page 9: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingUltimately, what we are interested in is learning topics

Perhaps instead of finding parameters   that maximize likelihood

Sample from a distribution   that gives us topic estimates

θ

Pr(θ|D)

8 / 48

Page 10: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingUltimately, what we are interested in is learning topics

Perhaps instead of finding parameters   that maximize likelihood

Sample from a distribution   that gives us topic estimates

But, we only have talked about   how can we sample parameters?

θ

Pr(θ|D)

Pr(D|θ)

9 / 48

Page 11: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingLike EM, the trick here is to expand model with latent data 

And sample from distribution 

Zm

Pr(θ, Zm|Z)

10 / 48

Page 12: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingLike EM, the trick here is to expand model with latent data 

And sample from distribution 

This is challenging, but sampling from   and   is easier

Zm

Pr(θ, Zm|Z)

Pr(θ|Zm, Z) Pr(Zm|θ, Z)

11 / 48

Page 13: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingThe Gibbs Sampler does exactly that

Property: After some rounds, samples from the conditional distributions 

Correspond to samples from marginal 

Pr(θ|Zm, Z)

Pr(θ|Z) = ∑Z

m Pr(θ, Zm|Z)

12 / 48

Page 14: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingQuick aside, how to simulate data for pLSA?

Generate parameters   and Generate 

{pd} {θt}

Δw,d,t

13 / 48

Page 15: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingLet's go backwards, let's deal with Δw,d,t

14 / 48

Page 16: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingLet's go backwards, let's deal with 

Where   was as given by E­step

Δw,d,t

Δw,d,t ∼ Multxw,d(γw,d,1, … , γw,d,T )

γw,d,t

15 / 48

Page 17: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingLet's go backwards, let's deal with 

Where   was as given by E­step

for d in range(num_docs):

delta[d,w,:] = np.random.multinomial(doc_mat[d,w],

gamma[d,w,:])

Δw,d,t

Δw,d,t ∼ Multxw,d(γw,d,1, … , γw,d,T )

γw,d,t

16 / 48

Page 18: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingHmm, that's a problem since we need  ...

But, we know   so, let's use that to generate each   as

xw,d

Pr(w, d) = ∑t pt,dθw,t xw,d

xw,d ∼ Multnd(Pr(1, d), … ,Pr(W , d))

17 / 48

Page 19: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingHmm, that's a problem since we need  ...

But, we know   so, let's use that to generate each   as

for d in range(num_docs):

doc_mat[d,:] = np.random.multinomial(nw[d], np.sum(p[:,d] * theta), axis=0)

xw,d

Pr(w, d) = ∑t pt,dθw,t xw,d

xw,d ∼ Multnd(Pr(1, d), … ,Pr(W , d))

18 / 48

Page 20: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingNow, how about  ? How do we generate the parameters of a Multinomialdistribution?

pd

19 / 48

Page 21: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingNow, how about  ? How do we generate the parameters of a Multinomialdistribution?

This is where the Dirichlet distribution comes in...

If  , then

pd

pd ∼ Dir(α)

Pr(pd) ∝T

∏t=1

pαt−1t,d

20 / 48

Page 22: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingSome interesting properties:

So, if we set all   we will tend to have uniform probability over topics (  each on average)

If we increase   it will also have uniform probability but will have verylittle variance (it will almost always be  )

E[pt,d] =αt

∑t′ αt′

αt = 1

1/t

αt = 100

1/t

21 / 48

Page 23: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingSo, we can say   and pd ∼ Dir(α) θt ∼ Dir(β)

22 / 48

Page 24: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingSo, we can say   and 

And generate data as (with  )

for d in range(num_docs):

p[:,d] = np.random.dirichlet(1. * np.ones(num_topics))

pd ∼ Dir(α) θt ∼ Dir(β)

αt = 1

23 / 48

Page 25: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingSo what we have is a prior over parameters   and  :   and 

And we can formulate a distribution for missing data  :

{pd} {θt} Pr(pd|α) Pr(θt|β)

Δw,d,t

Pr(Δw,d,t|pd, θt,α,β) =

Pr(Δw,d,t|pd, θt)Pr(pd|α)Pr(θt|β)

24 / 48

Page 26: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingHowever, what we care about is the posterior distribution 

What do we do???

Pr(pd|Δw,d,t, θt,α,β)

25 / 48

Page 27: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingAnother neat property of the Dirichlet distribution is that it is conjugate tothe Multinomial

If   and  , thenθ|α ∼ Dir(α) X|θ ∼ Multinomial(θ)

θ|X, α ∼ Dir(X + α)

26 / 48

Page 28: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingThat means we can sample   from

and

pd

pt,d ∼ Dir(∑w

Δw,d,t + α)

θw,t ∼ Dir(∑d

Δw,d,t + β)

27 / 48

Page 29: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Blei, Ng, Jordan (2003), JMLR

Approximate Inference by SamplingCoincidentally, we have just specified the Latent Dirichlet Allocationmethod for topic modeling.

This is the most commonly used method for topic modeling

28 / 48

Page 30: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingWe can now specify a full Gibbs Sampler for an LDA mixture model.

Given:

Word­document counts Number of topics Prior parameters   and 

Do: Learn parameters   and   for   topics

xw,d

K

α β

{pd} {θt} K

29 / 48

Page 31: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingStep 0: Initialize parameters   and 

and

{pd} {θt}

pd ∼ Dir(α)

θt ∼ Dir(β)

30 / 48

Page 32: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingStep 1:

Sample   based on current parameters   and Δw,d,t {pd} {θt}

Δw,d,. ∼ Multxw,d(γw,d,1, … , γw,d,T )

31 / 48

Page 33: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingStep 2:

Sample parameters from

and

pt,d ∼ Dir(∑w

Δw,d,t + α)

θw,t ∼ Dir(∑d

Δw,d,t + β)

32 / 48

Page 34: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingStep 3:

Get samples for a few iterations (e.g., 200), we want to reach astationary distribution...

33 / 48

Page 35: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingStep 4:

Estimate   as the average of the estimates from the last   iterations(e.g., m=500)

Δ̂w,d,t m

34 / 48

Page 36: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Approximate Inference by SamplingStep 5:

Estimate parameters   and   based on estimated pd θt Δ̂w,d,t

p̂ t,d =∑w Δ̂w,d,t + α

∑t∑w Δ̂w,d,t + α

θ̂w,t =∑d Δ̂w,d,t + β

∑w∑d Δ̂w,d,t + β

35 / 48

Page 37: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Mixture modelsWe have now seen two different mixture models: soft k­means and topicmodels

36 / 48

Page 38: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Mixture modelsWe have now seen two different mixture models: soft k­means and topicmodels

Two inference procedures:

Exact Inference with Maximum Likelihood using the EM algorithmApproximate Inference using Gibbs Sampling

37 / 48

Page 39: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Mixture modelsWe have now seen two different mixture models: soft k­means and topicmodels

Two inference procedures:

Exact Inference with Maximum Likelihood using the EM algorithmApproximate Inference using Gibbs Sampling

Next, we will go back to Maximum Likelihood but learn aboutApproximate Inference using Variational Methods

38 / 48

Page 40: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsConsider LDA model again

Benefits

Full document generative modelCan process new documents(posterior over topics) and words(prior parameters)

39 / 48

Page 41: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsConsider LDA model again

With Gibbs we sampled from 

What if we want to estimateparameters again? (maximum aposteriori parameters)

Pr(θ, Δ|x, α, β)

40 / 48

Page 42: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsConsider LDA model again

Very difficult to maximize

Harder than pLSA due to Dirichletpriors

41 / 48

Page 43: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsConsider LDA model again

Let's get inspiration from EM:maximize lower bound

But what should the lower boundbe?

42 / 48

Page 44: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsConsider LDA model again

Make missing data and parameters"independent"!

43 / 48

Page 45: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsConsider LDA model again

Find parameters that make simplemodel most similar to original model

44 / 48

Page 46: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsCan then define EM­like algorithm

E­step: define expectation w.r.t. approximate distribution

M­step: maximize parameters of approximate distribution

45 / 48

Page 47: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Variational MethodsNet result:

1) Maximum posterior estimates 2) Super simple updates 3) Withstochastic approach (update using a few words at a time), extremelyscalable

46 / 48

Page 48: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

Hoffman, et al. (2010). NIPS

Variational Methods

47 / 48

Page 49: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2

ConclusionProbabilistic mixture models: powerful model class with manyapplications

Awesome historical algorithmic development

Outstanding software support

48 / 48