Variational Inference for the Indian Buffet...

Post on 23-May-2020

4 views 0 download

Transcript of Variational Inference for the Indian Buffet...

Variational Inference for the Indian Buffet Process

Finale Doshi-Velez† Kurt T. Miller† Jurgen Van Gael† Yee Whye TehCambridge University UC Berkeley Cambridge University Gatsby Unit

† Authors contributed equally

Introduction

Motivating example

We are interested in extracting unobserved features from observed data.For example:

• Latent classes ⇒ Mixture models• Latent features ⇒ Latent feature models

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 2

Introduction

Motivating example

We are interested in extracting unobserved features from observed data.For example:

• Latent classes ⇒ Mixture models

• Latent features ⇒ Latent feature models

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 2

Introduction

Motivating example

We are interested in extracting unobserved features from observed data.For example:

• Latent classes ⇒ Mixture models• Latent features ⇒ Latent feature models

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 2

Introduction

Linear Gaussian Latent Feature ModelWe will focus on one example of a latent feature model:

X Z A+= !. . .

...

Observation for object i Features for object i

!

Observed Unobserved

• N = Number of data points

• D = Dimension of observed data

• K = Number of latent features

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 3

Introduction

Linear Gaussian Latent Feature Model

We will focus on one example of a latent feature model:

X Z A+= !. . .

...

D

D

N N !

K

K

Observed Unobserved

• N = Number of data points

• D = Dimension of observed data

• K = Number of latent features

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 3

Introduction

Linear Gaussian Model Latent Feature Model

Goal: Infer Z and A given data X.

Approach: Bayes’ rule:

p(Z,A|X) ∝ p(X|Z,A)p(A)︸ ︷︷ ︸Model specific

× p(Z)︸︷︷︸Prior on binary matrices

In the linear Gaussian model, we use

• p(X|Z,A) ∼ N (ZA, σ2nI)

• p(A) ∼ N (0, σ2AI)

• p(Z) ∼?X

ZA

!A ?

!n

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 4

Introduction

Linear Gaussian Model Latent Feature Model

Goal: Infer Z and A given data X.

Approach: Bayes’ rule:

p(Z,A|X) ∝ p(X|Z,A)p(A)︸ ︷︷ ︸Model specific

× p(Z)︸︷︷︸Prior on binary matrices

In the linear Gaussian model, we use

• p(X|Z,A) ∼ N (ZA, σ2nI)

• p(A) ∼ N (0, σ2AI)

• p(Z) ∼?X

ZA

!A ?

!n

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 4

Introduction

Linear Gaussian Model Latent Feature Model

Goal: Infer Z and A given data X.

Approach: Bayes’ rule:

p(Z,A|X) ∝ p(X|Z,A)p(A)︸ ︷︷ ︸Model specific

× p(Z)︸︷︷︸Prior on binary matrices

In the linear Gaussian model, we use

• p(X|Z,A) ∼ N (ZA, σ2nI)

• p(A) ∼ N (0, σ2AI)

• p(Z) ∼?X

ZA

!A ?

!n

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 4

The Indian Buffet Process

The Indian Buffet Process - Stick-breaking construction

• First generate v1, v2, · · · i.i.d.∼ Beta(α, 1).

• Let πi =∏ij=1 vj .

• Sample znk ∼ Bernoulli(πk).

· · ·v1 v2 v3 v4 v5 v6 v7 v8 v9

(Teh et al, 2007)

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 5

The Indian Buffet Process

The Indian Buffet Process - Stick-breaking construction

• First generate v1, v2, · · · i.i.d.∼ Beta(α, 1).

• Let πi =∏ij=1 vj .

• Sample znk ∼ Bernoulli(πk).

· · ·v1 v2 v3 v4 v5 v6 v7 v8 v9 !1 !2 !3 !4

!!5 !6 !7 !8 !9 · · ·

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 5

The Indian Buffet Process

The Indian Buffet Process - Stick-breaking construction

• First generate v1, v2, · · · i.i.d.∼ Beta(α, 1).

• Let πi =∏ij=1 vj .

• Sample znk ∼ Bernoulli(πk).

Z

!1 !2 !3 !4 !5 !6 !7 !8 !9 · · ·

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 5

The Indian Buffet Process

Full Linear Gaussian Latent Feature Model

Model:

• p(X|Z,A) ∼ N (ZA, σ2nI)

• p(A) ∼ N (0, σ2AI)

• p(Z) ∼ IBP(α)X

ZA

!A

!n

!

v

Given X, how do we do inference on Z and A?

• Even for finite K, there are 2NK possible Z.

• Many local optima.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 6

The Indian Buffet Process

Full Linear Gaussian Latent Feature Model

Model:

• p(X|Z,A) ∼ N (ZA, σ2nI)

• p(A) ∼ N (0, σ2AI)

• p(Z) ∼ IBP(α)X

ZA

!A

!n

!

v

Given X, how do we do inference on Z and A?

• Even for finite K, there are 2NK possible Z.

• Many local optima.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 6

The Indian Buffet Process

Inference in the Linear Gaussian Model

N = 100, D = 500,K = 25

0 5 10 15 20 25 30−7

−6

−5

−4

−3

−2

−1

Time sampler run (minutes)

Pre

dict

ive

log

likel

ihoo

d

N = 500, D = 500,K = 25

0 5 10 15 20 25 30

−10

−9

−8

−7

−6

−5

−4

Time sampler run (minutes)P

redi

ctiv

e lo

g lik

elih

ood

Collapsed Gibbs

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 7

The Indian Buffet Process

Inference in the Linear Gaussian Model

N = 100, D = 500,K = 25

0 5 10 15 20 25 30−7

−6

−5

−4

−3

−2

−1

Time sampler run (minutes)

Pre

dict

ive

log

likel

ihoo

d

N = 500, D = 500,K = 25

0 5 10 15 20 25 30

−10

−9

−8

−7

−6

−5

−4

Time sampler run (minutes)P

redi

ctiv

e lo

g lik

elih

ood

Uncollapsed GibbsCollapsed Gibbs

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 7

The Indian Buffet Process

Inference in the Linear Gaussian Model

N = 100, D = 500,K = 25

0 5 10 15 20 25 30−7

−6

−5

−4

−3

−2

−1

Time sampler run (minutes)

Pre

dict

ive

log

likel

ihoo

d

N = 500, D = 500,K = 25

0 5 10 15 20 25 30

−10

−9

−8

−7

−6

−5

−4

Time sampler run (minutes)P

redi

ctiv

e lo

g lik

elih

ood

VariationalUncollapsed GibbsCollapsed Gibbs

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 7

Variational Inference for the IBP

Mean Field Variational Inference

Approximate p(Z,A|X) with distribution q(Z,A) from a family Q that is“close” to p(Z,A|X).

How do we define “close”? We will attempt to find

q(Z,A) = arg minq∈Q

D(q(Z,A)||p(Z,A|X)).

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 8

Variational Inference for the IBP

Mean Field Variational Inference

Approximate p(Z,A|X) with distribution q(Z,A) from a family Q that is“close” to p(Z,A|X).

How do we define “close”? We will attempt to find

q(Z,A) = arg minq∈Q

D(q(Z,A)||p(Z,A|X)).

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 8

Variational Inference for the IBP

How do we choose Q?

p(Z,A|X) is a distribution over infinitely many features.

Trick (Blei and Jordan, 2004): Let Q be a truncated family where weassume that Z is nonzero in at most the first K columns.

Why can we do this? Intuitively, the probability πk that znk is onedecreases exponentially quickly.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 9

Variational Inference for the IBP

Truncation boundMore formally, let mK(X) be the marginal of X when Z and A areintegrated out when we truncate the stick-breaking construction atcolumn K.

Then we can show

14

∫|mK(X)−m∞(X)|dX ≤ 1− exp

(−Nα

α+ 1

)K).

0 10 20 30 40 50 60 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Bou

nd o

n L1

Dis

tanc

e

K

Truncation BoundTrue Distance

This is the first such bound for the IBP and can serve as a guideline forhow to choose K for the family Q.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 10

Variational Inference for the IBP

Truncation boundMore formally, let mK(X) be the marginal of X when Z and A areintegrated out when we truncate the stick-breaking construction atcolumn K.

Then we can show

14

∫|mK(X)−m∞(X)|dX ≤ 1− exp

(−Nα

α+ 1

)K).

0 10 20 30 40 50 60 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Bou

nd o

n L1

Dis

tanc

e

K

Truncation BoundTrue Distance

This is the first such bound for the IBP and can serve as a guideline forhow to choose K for the family Q.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 10

Variational Inference for the IBP

How do we choose Q?

We let our family Q be the parameterized family (introducing thestick-breaking variables v)

q(Z,A, v) = qν(Z)qφ(A)qτ (v)

True distribution:

X

ZA

!A

!n

!

v

Variational distribution:

ZA

v

!

! !

• qνnk(znk) = Bernoulli(znk; νnk)

• qφk(Ak·) = N (Ak·; φ̄k,Φk)

• qτk(vk) = Beta(vk; τk1, τk2)

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 11

Variational Inference for the IBP

Inference

Inference now reduces to finding variational parameters (τ, φ,Φ, ν) suchthat q ∈ Q is “close” to p.

(τ, φ,Φ, ν) = arg minq∈Q

D(q(Z,A)||p(Z,A|X)).

This is not a convex optimization, so we can only hope to find a localoptimum.

⇒ Parameter updates done iteratively.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 12

Variational Inference for the IBP

Inference

Inference now reduces to finding variational parameters (τ, φ,Φ, ν) suchthat q ∈ Q is “close” to p.

(τ, φ,Φ, ν) = arg minq∈Q

D(q(Z,A)||p(Z,A|X)).

This is not a convex optimization, so we can only hope to find a localoptimum.

⇒ Parameter updates done iteratively.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 12

Variational Inference for the IBP

Parameter updates

Many calculations are straightforward exponential family calculations.

The only nontrivial calculation is Ev,Z [log p(Znk|v)] which requiresevaluating

Ev

[log

(1−

k∏m=1

vm

)]

We provide an efficient way to lower bound this term.

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 13

Results

Results: Synthetic data

N = 100, D = 500,K = 25

0 5 10 15 20 25 30−7

−6

−5

−4

−3

−2

−1

Time sampler run (minutes)

Pre

dict

ive

log

likel

ihoo

d

N = 500, D = 500,K = 25

0 5 10 15 20 25 30

−10

−9

−8

−7

−6

−5

−4

Time sampler run (minutes)P

redi

ctiv

e lo

g lik

elih

ood

VariationalUncollapsed GibbsCollapsed Gibbs

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 14

Results

Results: Real data

2 data sets:

• Yale faces data set: linear Gaussian model, N = 721, D = 1024(32× 32 images)

• Speech data set: iICA model, N = 245, D = 10

0 50 100 150 200 250−30

−20

−10

0

10

20

30

Time

Spe

ech

wav

efor

ms

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 15

Results

Results: Real data

Faces data set: N = 721, D = 1024

5 10 250

0.5

1

1.5

2

2.5

K

Neg

ativ

e lo

g lik

elih

ood

Large D, N - Variational helps

Speech data set: N = 245, D = 10

2 5 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

KN

egat

ive

log

likel

ihoo

d

Uncollapsed GibbsVariational

Small N , D - Variational does not help

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 16

Summary

• We present the first variational inference algorithm for the IBP.

• For large N and D, it finds better local optima than the samplers.

• We also present the first truncation bound for the IBP.

Code will be available soon from our websites.

Questions?

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh Variational Inference for the Indian Buffet Process 17