Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not...

143
Nonparametric Bayesian Statistics Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT

Transcript of Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not...

Page 1: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Nonparametric Bayesian Statistics

Tamara BroderickITT Career Development Assistant Professor Electrical Engineering & Computer Science

MIT

Page 2: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

1

Page 3: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

1

Page 4: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric (wait!) • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

1

Page 5: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

1

Page 6: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

1

Page 7: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

1

Page 8: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

1

Page 9: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

“Wikipedia phenomenon”

1

Page 10: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

1

Page 11: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

1

Page 12: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

1

[Escobar, West 1995; Ghosal et al 1999]

Page 13: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

[Arjas, Gasbarra 1994]

1

[Escobar, West 1995; Ghosal et al 1999]

Page 14: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

[Arjas, Gasbarra 1994]

1

[Fox et al 2014]

[Escobar, West 1995; Ghosal et al 1999]

Page 15: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

[Arjas, Gasbarra 1994]

1

[Ewens, 1972; Hartl, Clark 2003]

[Fox et al 2014]

[Escobar, West 1995; Ghosal et al 1999]

Page 16: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

[Arjas, Gasbarra 1994]

[Saria et al

2010]1

[Ewens, 1972; Hartl, Clark 2003]

[Fox et al 2014]

[Escobar, West 1995; Ghosal et al 1999]

Page 17: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

[Arjas, Gasbarra 1994]

1

[Saria et al

2010]

[Ewens, 1972; Hartl, Clark 2003]

[Lloyd et al 2012; Miller et al 2010]

[Fox et al 2014]

[Escobar, West 1995; Ghosal et al 1999]

Page 18: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Bayesian statistics that is not parametric • Bayesian

!

• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

Nonparametric Bayes

P(parameters|data) / P(data|parameters)P(parameters)

[wikipedia.org]

[Ed Bowlby, NOAA]

[Sudderth, Jordan 2009]

[Lloyd et al 2012; Miller et al 2010]

[Arjas, Gasbarra 1994]

[Fox et al 2014]

1

[Escobar, West 1995; Ghosal et al 1999]

[Saria et al

2010]

[Ewens 1972; Hartl, Clark 2003]

Page 19: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

X1, X2, . . .

2

Page 20: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

X1, X2, . . .

2

Page 21: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

X1, X2, . . .

[Hewitt, Savage 1955; Aldous 1983]2

Page 22: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

X1, X2, . . .

p(X1, . . . , XN ) =

Z

NY

n=1

p(Xn|✓)P (d✓)

[Hewitt, Savage 1955; Aldous 1983]2

Page 23: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

p(X1, . . . , XN ) =

Z

NY

n=1

p(Xn|✓)P (d✓)

X1, X2, . . .

[Hewitt, Savage 1955; Aldous 1983]2

Page 24: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

p(X1, . . . , XN ) =

Z

NY

n=1

p(Xn|✓)P (d✓)

X1, X2, . . .

[Hewitt, Savage 1955; Aldous 1983]2

Page 25: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

p(X1, . . . , XN ) =

Z

NY

n=1

p(Xn|✓)P (d✓)

X1, X2, . . .

[Hewitt, Savage 1955; Aldous 1983]2

Page 26: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the

distribution of any N data points doesn’t change when permuted:

• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !

• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors

Nonparametric Bayes

p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))

p(X1, . . . , XN ) =

Z

NY

n=1

p(Xn|✓)P (d✓)

X1, X2, . . .

[Hewitt, Savage 1955; Aldous 1983]2

Page 27: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 28: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 29: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 30: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 31: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 32: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 33: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 34: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Outline• Dirichlet process

• Background for intuition • Generative model • What does a growing/infinite number of parameters

really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric

Bayesian statistics

3

Page 35: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative model

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 36: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative model• Finite Gaussian mixture

model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 37: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

• Inference goal: assignments of data points to clusters, cluster parameters

• Finite Gaussian mixture model (K=2 clusters)

4

Page 38: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 39: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 40: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 41: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 42: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 43: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 44: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 45: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 46: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 47: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 48: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 49: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K=2 clusters)

• Don’t know µ1, µ2

• Don’t know ⇢1, ⇢2

zniid⇠ Categorical(⇢1, ⇢2)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1

⇢1 ⇢2

• Inference goal: assignments of data points to clusters, cluster parameters

4

Page 50: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

��(m) = (m� 1)!

�(x) = x�(x� 1)

⇢1 2 (0, 1)

5

Page 51: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

��(m) = (m� 1)!

�(x) = x�(x� 1)

⇢1 2 (0, 1)

5

Page 52: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

�(x) = x�(x� 1)

⇢1 2 (0, 1)

5

�(m+ 1) = m!

Page 53: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 2 (0, 1)

5

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 54: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 2 (0, 1)

5

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 55: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

ρ1

dens

ity

⇢1 2 (0, 1)

5

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 56: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

ρ1

dens

ity

⇢1 2 (0, 1)

5

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 57: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • • [R demo]

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

ρ1

dens

ity

⇢1 2 (0, 1)

5

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 58: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

ρ1

dens

ity

a1 > a2

⇢1 2 (0, 1)

5

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 59: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

ρ1

dens

ity

a1 > a2

⇢1 2 (0, 1)

5

[demo]

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 60: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

ρ1

dens

ity

a1 > a2

⇢1 2 (0, 1)

5

[demo]

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 61: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

⇢1 2 (0, 1)

5

[demo]

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 62: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)

1{z=2}⇢a1�11 (1� ⇢1)

a2�1

⇢1 2 (0, 1)

5

[demo]

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 63: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)

1{z=2}⇢a1�11 (1� ⇢1)

a2�1

⇢1 2 (0, 1)

5

[demo]

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 64: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)

1{z=2}⇢a1�11 (1� ⇢1)

a2�1

⇢1 2 (0, 1)

5

[demo]

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 65: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)

1{z=2}⇢a1�11 (1� ⇢1)

a2�1

⇢1 2 (0, 1)

5

[demo]

p(⇢1|z) / ⇢a1+1{z=1}�11 (1� ⇢1)

a2+1{z=2}�1 / Beta(⇢1|a1 + 1{z = 1}, a2 + 1{z = 2})

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 66: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)

1{z=2}⇢a1�11 (1� ⇢1)

a2�1

⇢1 2 (0, 1)

5

[demo]

p(⇢1|z) / ⇢a1+1{z=1}�11 (1� ⇢1)

a2+1{z=2}�1 / Beta(⇢1|a1 + 1{z = 1}, a2 + 1{z = 2})

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 67: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Beta distribution reviewBeta(⇢1|a1, a2) =

�(a1 + a2)

�(a1)�(a2)⇢a1�11 (1� ⇢1)

a2�1 a1, a2 > 0

• Gamma function • integer m: • for x > 0:

• What happens? • • •

• Beta is conjugate to Cat

a = a1 = a2 ! 0a = a1 = a2 ! 1

⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1

dens

ity

a1 > a2

p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)

1{z=2}⇢a1�11 (1� ⇢1)

a2�1

⇢1 2 (0, 1)

5

[demo]

p(⇢1|z) / ⇢a1+1{z=1}�11 (1� ⇢1)

a2+1{z=2}�1 / Beta(⇢1|a1 + 1{z = 1}, a2 + 1{z = 2})

�(m+ 1) = m!�(x+ 1) = x�(x)

Page 68: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K clusters)

6

Page 69: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K clusters)

6

Page 70: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K clusters)

⇢1 ⇢2

⇢1:K ⇠ Dirichlet(a1:K)

⇢36

Page 71: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K clusters)

⇢1 ⇢2

⇢1:K ⇠ Dirichlet(a1:K)

⇢36

µkiid⇠ N (µ0,⌃0)

Page 72: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K clusters)

⇢1 ⇢2

⇢1:K ⇠ Dirichlet(a1:K)

⇢36

µkiid⇠ N (µ0,⌃0)

zniid⇠ Categorical(⇢1:K)

Page 73: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Generative modelP(parameters|data) / P(data|parameters)P(parameters)

• Finite Gaussian mixture model (K clusters)

xnindep⇠ N (µzn ,⌃)

µkiid⇠ N (µ0,⌃0)

⇢1 ⇢2

zniid⇠ Categorical(⇢1:K)

⇢1:K ⇠ Dirichlet(a1:K)

⇢36

Page 74: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution reviewDirichlet(⇢1:K |a1:K) =

�(PK

k=1 ak)QKk=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

7

Page 75: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution reviewDirichlet(⇢1:K |a1:K) =

�(PK

k=1 ak)QKk=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

⇢k 2 (0, 1)X

k

⇢k = 1

7

Page 76: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution reviewDirichlet(⇢1:K |a1:K) =

�(PK

k=1 ak)QKk=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

7

Page 77: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

7

Page 78: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

7

Page 79: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

7

Page 80: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

7

Page 81: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

7

Page 82: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

[demo]

7

Page 83: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

[demo]

7

Page 84: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1

⇢1:K ⇠ Dirichlet(a1:K), z ⇠ Cat(⇢1:K)

a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

[demo]

7

Page 85: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Dirichlet distribution review

• What happens? • Dirichlet is conjugate to Categorical

Dirichlet(⇢1:K |a1:K) =�(

PKk=1 ak)QK

k=1 �(ak)

KY

k=1

⇢ak�1k ak > 0

a = ak ! 0 a = ak ! 1

⇢1:K ⇠ Dirichlet(a1:K), z ⇠ Cat(⇢1:K)

⇢1:K |z d= Dirichlet(a01:K), a0k = ak + 1{z = k}

a = ak = 1

a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)

ρ1

dens

ity

ρ2

[demo]

7

Page 86: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

What if K > N ?• e.g. species sampling, topic modeling, groups on a

social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

8

Page 87: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

What if K > N ?

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

8

Page 88: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• e.g. species sampling, topic modeling, groups on a social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

What if K > N ?

8

Page 89: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• e.g. species sampling, topic modeling, groups on a social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

What if K > N ?

8

Page 90: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• e.g. species sampling, topic modeling, groups on a social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

What if K > N ?

8

Page 91: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• e.g. species sampling, topic modeling, groups on a social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• [demo 1, demo 2]

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

What if K > N ?

8

Page 92: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• e.g. species sampling, topic modeling, groups on a social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• [demo 1, demo 2]

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

What if K > N ?

8

Page 93: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• e.g. species sampling, topic modeling, groups on a social network, etc.

⇢1 ⇢2 ⇢3

⇢1000

• Components: number of latent groups

• Clusters: number of components represented in the data

• [demo 1, demo 2]

• Number of clusters for N data points is < K and random

• Number of clusters grows with N

What if K > N ?

8

Page 94: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

• Here, difficult to choose finite K in advance (contrast with small K): don’t know K, difficult to infer, streaming data

• How to generate K = ∞ strictly positive frequencies that sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

9

Page 95: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

9

Page 96: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

9

Page 97: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

9

Page 98: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

9

Page 99: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

9

Page 100: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

9

Page 101: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

9

Page 102: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

9

Page 103: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

9

Page 104: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

• “Stick breaking”

9

Page 105: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4)

• “Stick breaking”

9

Page 106: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1

• “Stick breaking”

9

Page 107: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1

V2 ⇠ Beta(a2, a3 + a4)

• “Stick breaking”

9

Page 108: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1

V2 ⇠ Beta(a2, a3 + a4)

• “Stick breaking”

⇢2 = (1� V1)V2

9

Page 109: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1

V2 ⇠ Beta(a2, a3 + a4) ⇢2 = (1� V1)V2

V3 ⇠ Beta(a3, a4)

• “Stick breaking”

9

Page 110: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1

V2 ⇠ Beta(a2, a3 + a4) ⇢2 = (1� V1)V2

V3 ⇠ Beta(a3, a4) ⇢3 = (1� V1)(1� V2)V3

• “Stick breaking”

9

Page 111: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)) ⇢1

d= Beta(a1,

KX

k=1

ak � a1)

V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1

V2 ⇠ Beta(a2, a3 + a4) ⇢2 = (1� V1)V2

V3 ⇠ Beta(a3, a4) ⇢3 = (1� V1)(1� V2)V3

⇢4 = 1�3X

k=1

⇢k

• “Stick breaking”

9

Page 112: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)

10

Page 113: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

10

Page 114: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

10

Page 115: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

10

Page 116: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1) ⇢1 = V1

10

Page 117: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

⇢1 = V1

10

Page 118: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

⇢1 = V1

⇢2 = (1� V1)V2

10

Page 119: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

⇢1 = V1

⇢2 = (1� V1)V2

10

Page 120: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

⇢1 = V1

⇢2 = (1� V1)V2

10

Page 121: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

Vk ⇠ Beta(ak, bk)

⇢1 = V1

⇢2 = (1� V1)V2

10

Page 122: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

Vk ⇠ Beta(ak, bk)

⇢1 = V1

⇢2 = (1� V1)V2

⇢k =

2

4k�1Y

j=1

(1� Vj)

3

5Vk

10

Page 123: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

Vk ⇠ Beta(ak, bk)

⇢1 = V1

⇢2 = (1� V1)V2

⇢k =

2

4k�1Y

j=1

(1� Vj)

3

5Vk

[Ishwaran, James 2001]10

Page 124: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

Vk ⇠ Beta(ak, bk)

⇢1 = V1

⇢2 = (1� V1)V2

⇢k =

2

4k�1Y

j=1

(1� Vj)

3

5Vk

[Ishwaran, James 2001]10

Page 125: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

V1 ⇠ Beta(a1, b1)

V2 ⇠ Beta(a2, b2)

Vk ⇠ Beta(ak, bk)

⇢1 = V1

⇢2 = (1� V1)V2

⇢k =

2

4k�1Y

j=1

(1� Vj)

3

5Vk

ak = 1, bk = ↵ > 0

⇢ = (⇢1, ⇢2, . . .) ⇠ GEM(↵)

[McCloskey 1965; Engen 1975; Patil and Taillie 1977; Ewens 1987; Sethuraman 1994; Ishwaran, James 2001]10

Page 126: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with

small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that

sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:

ak = 1, bk = ↵ > 0

⇢ = (⇢1, ⇢2, . . .) ⇠ GEM(↵)

[McCloskey 1965; Engen 1975; Patil and Taillie 1977; Ewens 1987; Sethuraman 1994; Ishwaran, James 2001]

Vkiid⇠ Beta(1,↵) ⇢k =

2

4k�1Y

j=1

(1� Vj)

3

5Vk

10

Page 127: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

11

Page 128: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

11

Page 129: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

1 2

11

Page 130: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

1 2

11

Page 131: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

1 2

1 2 3 4

11

Page 132: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet process stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

1 2

1 2 3 4

11

Page 133: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet process stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

1 2

1 2 3 4

1 2 3 4 ……

11

Page 134: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Distributions• Beta → random distribution

over

• Dirichlet → random distribution over

• GEM / Dirichlet process stick-breaking → random distribution over

• Dirichlet process → random distribution over :

1, 2

1, 2, . . . ,K

1, 2, . . .

1 2

1 2 3 4

1 2 3 4 ……

�• Infinity of parameters: components • Growing number of parameters: clusters

11

Page 135: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the Dirichlet is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

12

1 2 3 4 ……

Page 136: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

1 2 3 4 ……

12

Page 137: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

1 2 3 4 ……

12

Page 138: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

1 2 3 4 ……

12

Page 139: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

1 2 3 4 ……

12

Page 140: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

1 2 3 4 ……

12

Page 141: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How do the two compare when you change α?

1 2 3 4 ……

12

Page 142: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

Exercises• Prove the beta (Dirichlet) is conjugate to the categorical

• What is the posterior after N data points? • Suppose ; prove that

!

!

• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ

⇢1:K ⇠ Dirichlet(a1:K)

) ⇢1d= Beta(a1,

KX

k=1

ak � a1)?? (⇢2,...,⇢K)1�⇢1

d= Dirichlet(a2, . . . , aK)

• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case

• How does the growth in N change when you change α?

1 2 3 4 ……

12

Page 143: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite

ReferencesA full reference list is provided at the end of the “Part III” slides.