Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes...

71
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University

Transcript of Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes...

Page 1: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Another Walkthrough of Variational Bayes

Bevan Jones

Machine Learning Reading Group

Macquarie University

Page 2: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Variational Bayes?

• Bayes ← Bayes’ Theorem

• But the integral is intractable! – Sampling

• Gibbs, Metropolis Hastings, Slice Sampling, Particle Filters…

– Variational Bayes • Change the equations, replacing intractable integrals • This involves searching for a good approximation

• Variational ← Calculus of Variations – A way of searching through a space of functions for the

“best” one

2

Page 3: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Useful Concepts

• Probability/Information Theory – Bayes’ Theorem – Expectations – Jensen’s Inequality – KL Divergence

• Calculus

– Functionals & Functional Derivatives – Lagrange Multipliers

• Logarithms

3

Page 4: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Outline

• The true likelihood

• Approximating the posterior

• The lower bound and a definition for “best”

• Finding the optimal approximation – Functionals & functional derivatives

– Connection to KL divergence

• The Mean-field approximation

• An inference procedure

• Dirichlet-multinomial example

4

Page 5: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• We have some observed data:

• We have a model relating latent variable z to the data:

• To guess z

• The problem is one of computing

• Or just as good

The (Log) Likelihood

5

Page 6: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Approximating p(z|x)

• The integral in the expression for p(x) may not be easily computed

• But we might be able to get by with an approximation for p(x, z)

• We’ll focus on approximating only part of it

6

Page 7: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Choosing q

• How to choose q?

• Ideally, we want the q that is closest to p

• Define a lower bound on p

– Make this a “function” of q

• Maximize the lower bound to make it as tight as possible

– Choose q accordingly

7

Page 8: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• Jensen’s Inequality

where f is concave

Bounding the Log Likelihood w/ Jensen’s Inequality

8

Page 9: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• Jensen’s Inequality

where f is concave

Bounding the Log Likelihood w/ Jensen’s Inequality

9

Page 10: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• Jensen’s Inequality

where f is concave

Bounding the Log Likelihood w/ Jensen’s Inequality

10

Page 11: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• Jensen’s Inequality

where f is concave

Bounding the Log Likelihood w/ Jensen’s Inequality

11

Page 12: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• We can’t calculate the log likelihood, but we can compute the lower bound

• Maximizing F tightens the lower bound on the likelihood

• What q maximizes F? • If q were a variable we could do this by taking

derivatives and solving for q

The Lower Bound

12

Page 13: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Functionals: the “Variational” in VB

• Functional: a kind of “meta-function” that takes a function as input

• We can view F[q] as a functional of q

• Calculus of functionals parallels that of functions

• Then, we can – take the derivative of F[q] with respect to q,

– set it to 0, and

– solve for q

13

Page 14: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Derivatives

14

Page 15: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• The change in functional as we change its function argument

Functional Derivatives

15

Page 16: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Useful Derivatives

16

Page 17: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Useful Derivatives

17

Page 18: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Useful Derivatives

18

Page 19: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Useful Derivatives

19

Page 20: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Useful Derivatives

20

Page 21: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating q

• Use Lagrange multipliers

constraint

21

Page 22: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating q

22

Page 23: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating q

23

Page 24: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating q

24

Page 25: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• Maximizing F is minimizing the KL divergence

• And

KL Divergence: An Alternative View

25

Page 26: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal q

• The best q(z) is p(z|x)

26

Page 27: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Where are we?

• We’ve bounded the likelihood (Jensen’s Ineq.)

• Made this bound tight (Lagrange Multipliers)

• But the best approximation is no approximation at all!

• We need to constrain q so that it’s tractable

27

Page 28: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal q in an Imperfect World

• We can’t compute q(z)=p(z|x) directly

• Instead, constrain the domain of F[q] to some set of more tractable functions

• This is usually done by making independence assumptions

– The mean field assumption: cut all dependencies

28

Page 29: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• We have some observed data:

• We have a model relating latent variables z and θ to the data:

• To guess z and θ we need

• But the integral is hard!

• Apply the mean field assumption

Example 2: Mean Field Assumption

29

Page 30: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The New Lower Bound

30

Page 31: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The New Lower Bound

31

Page 32: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The New Lower Bound

32

Page 33: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The New Lower Bound

Apply mean field assumption

33

Page 34: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• The integrals get simpler • In fact, these go away

The Benefit of Independence

34

Page 35: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimizing the Lower Bound

35

Page 36: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qθ(θ)

• Use Lagrange multipliers

constraint

36

Page 37: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qz(z)

• Use Lagrange multipliers

constraint

37

Page 38: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The Approximation q ≠ p

38

Page 39: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Estimating Parameters

• Now we have our approximation q

• We need to compute the expectations

• Use EM-like procedure, alternating between the two – It was hard to do this for p(z,θ|x) – It’s (hopefully) easy for q(z,θ)

• if we’ve defined p to make use of conjugacy • and if we’ve chosen the right constraint for q

39

Page 40: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating F

40

Page 41: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• As a side effect of inference, we already have

• It’s the log of the normalization constant for q(z)

• So, we really only need two more expectations

Calculating F

41

Page 42: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Uses for F

• We can often use F in cases where we would normally use the log likelihood – Measuring convergence

• No guarantee to maximize likelihood, but we do have F

• Others – Model selection

• Choose the model with the highest lower bound

– Selecting the number of clusters • Pick the number that gives us the highest lower bound

– Parameter optimization • Again, optimize the lower bound w.r.t. the parameters

42

Page 43: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Worked Example

Dirichlet-Multinomial Mixture Model

43

Page 44: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Dirichlet-Multinomial Mixture Model

φ

x

z

N

π K

α

β

44

Page 45: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The Intractable Integral

45

Page 46: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

The Mean Field Assumption

46

Page 47: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimizing F

• Apply Lagrange multipliers just like example 2

• In this case, we have simply replaced z, x, and θ with vectors

• The math is exactly the same

• But we need to find the expectations we skipped before

– Plug in the Dirichlet and multinomial distributions

47

Page 48: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal q(z,θ)

48

• Borrowed from example 2 • See slides 36-38

• All we need to do is apply the particulars of the Mixture model

Page 49: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qθ(θ)

49

Page 50: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qφ(φ): The Expectation

50

Page 51: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Dirichlet Distribution

51

Page 52: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qφ(φ): The Numerator

52

Page 53: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qφ(φ): The Normalization

53

Page 54: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qφ(φ): Conjugacy Helps

54

Page 55: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qπ(π)

• q(π) is essentially the same as q(φ)

• The only difference is that there are multiple π’s

• So, q(π) should be a product of Dirichlets

55

Page 56: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qπ(π): The Expectation

56

Page 57: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qπ(π): The Numerator

57

Page 58: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qπ(π): The Denominator

58

Page 59: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qπ(π): Putting Them Together

59

Page 60: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

A Useful Standard Result

• The digamma function

• The expectation under a Dirichlet of the log of an individual component of a Dirichlet random variable

60

Page 61: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qz(z)

61

• Again, borrowed from example 2

• See slides 36-38

• Here, we plug in the model definition

Page 62: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qz(z)

62

• First, let’s work with the simpler multinomial distribution

• Side effect: a kind of estimate for the multinomial parameter vector

Page 63: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qz(z): The Expectations

63

Page 64: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

• Now, let’s work with the product of multinomials

• Side effect: a kind of set of multinomial parameter vectors

• This is essentialy the same math required for HMMs and PCFGs

Optimal qz(z): The Expectations

64

Page 65: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qz(z): The Expectations

65

Page 66: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Optimal qz(z): Putting It Together

66

Page 67: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Implications of Assumption

• We should get the same result with an even weaker assumption

67

Page 68: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Inference

• “E-Step”: Expected Counts – Topic counts

– Topic-word pair counts

• “M-Step”: The Proportions – Topic j

– Topic-word pair j-k

68

Page 69: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating F

69

• Also borrowed from example 2

• See slides 40-41

• But we adapt it for the mixture model

Page 70: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating F

70

Page 71: Another Walkthrough of Variational Bayes - TWiki€¦ · Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University . Variational Bayes?

Calculating F: The Normalization Constant

71

• By product of computing