A Gentle Introduction to Bayesian Nonparametrics

32
A Gentle Introduction to Bayesian Nonparametrics II B [email protected] ˝ www.julyanarbel.com Bocconi University, Milan, Italy & Collegio Carlo Alberto, Turin Statalks Seminar @ Collegio Carlo Alberto February 12, 2016 1/31

Transcript of A Gentle Introduction to Bayesian Nonparametrics

A Gentle Introductionto Bayesian Nonparametrics II

B [email protected] Í www.julyanarbel.com

Bocconi University, Milan, Italy & Collegio Carlo Alberto, Turin

Statalks Seminar @ Collegio Carlo AlbertoFebruary 12, 2016

1/31

2/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet process

Species sampling processes

Completely random measures

3/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process

Species sampling processes

Completely random measuresToy mixture example

4/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process

Species sampling processes

Completely random measuresToy mixture example

5/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Bayesian nonparametric priors

Two main categories of priors depending on parameter spaces

Spaces of functionsrandom functions

• Stochastic processess.a. Gaussian processes

• Random basis expansions

• Random densities (expon.)

• Mixtures

Spaces of probability measuresrandom probability measures (RPM)

• Continuous measures• Polya tree• See spaces of functions

• Discrete measuresCornerstone: Dirichlet processWe’ll see others

[Wikipedia]

[Brix, 1999]

6/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Bayesian nonparametric models using RPM

Bayesian setting for exchangeable data

• Sequence of exchangeable data, random variables X1,X2, . . . withdistribution unchanged under permutation

• Discrete random probability measure P ∼ Q• Model

X1,X2, . . .Xn|Piid∼

P for discrete data∫ΘK( · |θ)P(dθ) for continuous data

• Learn about the data through the posterior distribution of P

[MCMSki V]

[Wikipedia]

7/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Bayesian nonparametric model

Bayesian nonparametric mixture

• As for any mixtures, original goal: to enrich collection of availabledistributions for modelling data

• Role in many other data-modelling contexts than density, eg survival rates,link functions, etc

• P is almost surely discrete, with random weights (pj)j and locations (θj)j

P =∞∑j=1

pjδθj

• Clustering

8/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process

Species sampling processes

Completely random measuresToy mixture example

9/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Distribution of the weights

Stick-breaking representation of a DP(α,G0)

• P =∑∞

j=1 pjδθj• Weights pj = πj

∏l<j(1− πl) with

πjiid∼ Beta(1, α),

• Expected rate of decay constrained to

Epj =αj−1

(α + 1)j

[Wikipedia]

10/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Clustering mechanism

Predictive probability function, aka Polya Urn or Chinese Restaurant process

• Consider species data X1, . . . ,Xn|Piid∼ P, and P ∼ DP(α,G0)

• Unique values: X ∗1 , . . . ,X∗kn with frequencies n1, . . . , nkn

• Then Xn+1|X1, . . .Xn is either• an existing observation X∗j wp ∝ nj• or a new draw from G0 wp ∝ α

• Gravitational effect valuable for clustering: rich gets richer

P[Xn+1 ∈ · |X1, . . .Xn] =α

α + nG0(.) +

1

α + n

kn∑j=1

njδX∗j

(.)

11/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Beyond the DP from predictive function viewpoint

A discrete random probability measure P can be classified in 3 main categoriesaccording to P[Xn+1 is “new” | X n]

1) P[Xn+1 is “new” | X n] = f (n,model parameters)⇐⇒ depends on n but not on kn and (n1, . . . , nkn )⇐⇒ Dirichlet process (Ferguson, 1973);

2) P[Xn+1 is “new” | X n] = f (n, kn,model parameters)⇐⇒ depends on n and kn but not on (n1, . . . , nkn )⇐⇒ Gibbs-type prior (Pitman, 2003);

3) P[Xn+1 is “new” | X n] = f (n, kn, (n1, . . . , nkn ),model parameters)⇐⇒ depends on n, kn and (n1, . . . , nkn )⇐⇒ tractability issues

12/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Tree of discrete random probability measures

CRM SSM

SBP NRMI SB Gibbs

BP Non-homo. Homo. ‡ Ø 0 ‡ < 0

NGG ‡ = 0 PY

NIG ‡-stable DP PT

CRM SSM

SBP NRMI SB Gibbs

BP Non-homo. Homo. ‡ Ø 0 ‡ < 0

NGG ‡ = 0 PY

NIG ‡-stable DP PTDP

CRM SSM

SBP NRMI SB Gibbs

BP Non-homo. Homo. ‡ Ø 0 ‡ < 0

NGG ‡ = 0 PY

NIG ‡-stable DP PT

SSM

13/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process

Species sampling processes

Completely random measuresToy mixture example

14/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Species sampling models

Species sampling process

Very general form of random probability measures s.t.

P =∞∑j=1

pjδθj +

(1−

∞∑j=1

pj

)G0

where θjiid∼ G0, an atomless probability distribution, and (pj) is an independent

subprobability vector. SSP called proper if∑∞

j=1 pj = 1

CRM SSM

SBP NRMI SB Gibbs

BP Non-homo. Homo. ‡ Ø 0 ‡ < 0

NGG ‡ = 0 PY

NIG ‡-stable DP PT

15/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Species sampling models: example 1

Stick-breaking processes

Special form of the weights, same construction as for the Dirichlet process

pj = πj

∏l<j

(1− πl) with πjind∼ Beta(aj , bj), aj , bj > 0

includes Dirichlet process and Pitman–Yor

16/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Species sampling models: example 2

Gibbs-type processes

Characterized by

• Choice of G0, σ < 1, a set of weights Vn,j which satisfy the recursion

Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1

• A predictive probability function

P[Xn+1 ∈ · | X (n)] =Vn+1,kn+1

Vn,kn︸ ︷︷ ︸P[“new” | X (n)]

G0( · )︸ ︷︷ ︸prior guess

+

(1− Vn+1,kn+1

Vn,kn

)︸ ︷︷ ︸

P[“old” | X (n)]

∑kni=1(Ni − σ) δX∗

i( · )

n − σkn︸ ︷︷ ︸weighted emprirical measure

• Crucially now P[Xn+1 = “new” | X (n)] depends on both the sample size nand the number of distinct values kn

17/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Species sampling models: examples 3 & 4

• Pitman–Yor (PY) processaka Two parameter Poisson–Dirichlet process [Pitman & Yor, 1997],obtained for σ ≥ 0 and θ > −σ or σ < 0 and θ = r |σ| with r ∈ N,

Vn,k =

∏k−1i=1 (θ + iσ)

(θ + 1)n−1which yields

P

[Xn+1 ∈ ·

∣∣∣∣X (n)

]=θ + σknθ + n

P∗( · ) +n − σknθ + n

∑Kni=1 (Ni − σ)δX∗

i( · )

n − σKn.

=⇒ if σ = 0, the PY reduces to the Dirichlet process and θ+Knσθ+n

to θθ+n

• Normalized generalized gamma process (NGG)

Vn,j =eβ σj−1

Γ(n)

n−1∑i=0

(n − 1

i

)(−1)i β i/σ Γ

(j − i

σ; β

)where β > 0, σ ∈ (0, 1) and Γ(a, x) denotes the incomplete gammafunction=⇒ if σ = 1/2 it reduces to the normalized inverse Gaussian process

18/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Number of clusters generated by the PY process

Prior distribution of the number of clusters kn

• θ controls the location (as for the DP)

• σ controls the flatness (or variability)

Example with n = 50, θ = 1 and σ = 0.2, 0.3, . . . , 0.8

00

0.20.21010 0.30.3

0.10.1

0.40.420200.50.5

30300.60.6

0.20.2

4040 0.70.7

0.80.85050

0.30.3

19/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process

Species sampling processes

Completely random measuresToy mixture example

20/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Tree of discrete random probability measures

CRM SSM

SBP NRMI SB Gibbs

BP Non-homo. Homo. ‡ Ø 0 ‡ < 0

NGG ‡ = 0 PY

NIG ‡-stable DP PT

21/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Completely random measures

Definition [Kingman, 1967]

Random measure G s.t. ∀ A1, . . . ,Ad disjoint setsG(A1), . . . ,G(Ad) are mutually independent

• aka Independent Increment Processes or Levyprocesses

• Building block for prior distributions for popularmodels in Biology, Computer science...

• Main advantage: moments are known

(xkcd)

22/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Properties

DiscretenessCRMs are almost surely discrete measures [Kingman,1992]. Under assumption of no fixed atoms

G =∑j≥1

pjδθj

where the jumps (pj)j≥1 and the jump points (θj)j≥1

are independent

Laplace transform

Existence of a measure ν(dv , dy) = ρy (dv)aG0(dy) called the Levy intensityon R+ × Y such that

E[e−

∫Y f (y)P(dy)

]= exp

(−∫R+×Y

[1− exp (−s f (y))] ν(ds, dy)

)

Called homogeneous if ρy = ρ, non homogeneous otherwise

23/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Examples of CRM defined by ρ(v)

Generalized gamma process

by Brix [1999]γ ∈ [0, 1), θ ≥ 0

e−θv

Γ(1− γ)v 1+γ

includes gamma, inverse-Gaussian,stable process

Stable-beta process

by Teh and Gorur [2009]σ ∈ [0, 1), c > −σ

Γ(c + 1)

Γ(1− σ)Γ(c + σ)v−σ−1(1− v)c+σ−1

includes beta process, a stable process

(xkcd)

24/31

Beyond the Dirichlet process Species sampling processes Completely random measures

A brief look at moments

Moments of the (random) mass µ(A)

Defined bymn(A) = E

[µn(A)

]Can be obtained by Faa di Bruno’s formula

mn(A) = E[µn(A)

]=∑(∗)

( nk1···kn)

n∏i=1

(κi (A)/i !

)ki ,over all nonnegative integers (k1, . . . , kn) s.t. k1 + 2k2 + · · ·+ nkn = n wherethe i-th cumulant is given by

κi (A) = aP0(A)

∫ ∞0

v iρ(dv)

25/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Normalized random measure with independent increments

Normalizing CRM: NRMI

A normalized random measure with independent increments (NRMI) is arandom probability measure obtained by normalizing a CRM G with (a.s.)positive and finite total mass G(X)

P( · ) = G( · )/G(X)

• Includes Dirichlet process,normalized generalized gamma,normalized inverse Gaussian,normalized stable...

• Connection between NRMI andSSM? Homogeneous NRMI

• Connection between NRMI andGibbs-type priors? Normalizedgeneralized gamma

CRM SSM

SBP NRMI SB Gibbs

BP Non-homo. Homo. ‡ Ø 0 ‡ < 0

NGG ‡ = 0 PY

NIG ‡-stable DP PT

26/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Table of Contents

Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process

Species sampling processes

Completely random measuresToy mixture example

27/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Toy mixture example

A popular use of discrete P is within hierarchical mixture models for densityestimation and clustering. The latter is carried out at the latent level and makes useof the discrete nature of P.

• n = 50 observations are drawn from a uniform mixture of two well-separatedGaussian distributions, N(1, 0.2) and N(10, 0.2);

• nonparametric mixture model

(Yi | mi , vi )ind∼ N(mi , vi ), i = 1, . . . , n

(mi , vi | P)iid∼ P i = 1, . . . , n

P ∼ Q

with Q a discrete nonparametric prior.

• The distribution of Kn represents the prior distribution on the number of mixturecomponents; some summary statistics of its posterior distribution of (Kn|Y (n)) isthen used as estimate of the number of mixture components.

28/31

Beyond the Dirichlet process Species sampling processes Completely random measures

• We select priors Q with misspecified parameters: in particular, they are chosen ina way that E[K50] = 25, which corresponds to a prior opinion on K50 remarkablyfar from the true number of components, namely 2.

• 5 different models:

• Dirichlet process with θ = 19.233;• PD processes with (σ, θ) = (0.73001, 1) & (σ, θ) = (0.25, 12.2157);• NGG processes with (σ, β) = (0.7353, 1) & (0.25, 48.4185).

Are the models flexible enough to shift a posteriori towards the correct number ofcomponents?

=⇒ the larger σ the better is the posterior estimate of Kn.=⇒ in terms of density estimation the difference is negligible; this is because one canalways fit a mixture density with more components than needed.

29/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Posterior distribution of the number of mixture components

DP(θ=19.233) NGG(σ , β)=(0.25,48.4185) PY(σ , θ)=(0.25,12.2157) NGG(σ , β)=(0.7353,1) PY(σ , θ)=(0.73001,1)

1 5 9 13 17 21 25

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45 DP(θ=19.233) NGG(σ , β)=(0.25,48.4185) PY(σ , θ)=(0.25,12.2157) NGG(σ , β)=(0.7353,1) PY(σ , θ)=(0.73001,1)

Posterior distributions on the number of groups corresponding to 5 mixturemodels.

30/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Posterior density estimates

DP(e=19.233) NGG(m , `)=(0.25,48.4185) PY(m , e)=(0.25,12.2157) NGG(m , `)=(0.7353,1) PY(m , e)=(0.73001,1) True

-2 0 2 4 6 8 10 12 14

0.1

0.2

0.3

0.4

0.5

0.6DP(e=19.233) NGG(m , `)=(0.25,48.4185) PY(m , e)=(0.25,12.2157) NGG(m , `)=(0.7353,1) PY(m , e)=(0.73001,1) True

Density estimates corresponding to the 5 mixture models.

31/31

Beyond the Dirichlet process Species sampling processes Completely random measures

Opening

Next talks of the workshop

• Matteo Ruggiero: Dependent processes in Bayesian Nonparametrics=⇒ covariate-dependent extensions of the Dirichlet process, aka diffusiveDirichlet mixtures

• Pierpaolo De Blasi: Asymptotics for discrete random measures=⇒ validation of posterior distribution for large n regime, on Dirichletprocess and Pitman–Yor

• Antonio Canale: Applications to Ecology and Marketing=⇒ mainly use of Dirichlet process

• JA: Discovery probabilities=⇒ Species sampling models, Pitman–Yor and normalized generalizedgamma

1/1

References

Thank you for your attention! I

Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes.Advances in Applied Probability, pages 929–953.

Kingman, J. (1967). Completely random measures. Pacific Journal of Mathematics,21(1):59–78.

Kingman, J. F. C. (1992). Poisson processes, volume 3. Oxford university press.

Teh, Y. W. and Gorur, D. (2009). Indian buffet processes with power-law behavior. InBengio, Y., Schuurmans, D., Lafferty, J., Williams, C., and Culotta, A., editors,Advances in Neural Information Processing Systems 22, pages 1838–1846. CurranAssociates, Inc.