A Gentle Introduction to Bayesian Nonparametrics
-
Upload
julyan-arbel -
Category
Data & Analytics
-
view
522 -
download
1
Transcript of A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introductionto Bayesian Nonparametrics II
B [email protected] Í www.julyanarbel.com
Bocconi University, Milan, Italy & Collegio Carlo Alberto, Turin
Statalks Seminar @ Collegio Carlo AlbertoFebruary 12, 2016
1/31
2/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet process
Species sampling processes
Completely random measures
3/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process
Species sampling processes
Completely random measuresToy mixture example
4/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process
Species sampling processes
Completely random measuresToy mixture example
5/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Bayesian nonparametric priors
Two main categories of priors depending on parameter spaces
Spaces of functionsrandom functions
• Stochastic processess.a. Gaussian processes
• Random basis expansions
• Random densities (expon.)
• Mixtures
Spaces of probability measuresrandom probability measures (RPM)
• Continuous measures• Polya tree• See spaces of functions
• Discrete measuresCornerstone: Dirichlet processWe’ll see others
[Wikipedia]
[Brix, 1999]
6/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Bayesian nonparametric models using RPM
Bayesian setting for exchangeable data
• Sequence of exchangeable data, random variables X1,X2, . . . withdistribution unchanged under permutation
• Discrete random probability measure P ∼ Q• Model
X1,X2, . . .Xn|Piid∼
P for discrete data∫ΘK( · |θ)P(dθ) for continuous data
• Learn about the data through the posterior distribution of P
[MCMSki V]
[Wikipedia]
7/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Bayesian nonparametric model
Bayesian nonparametric mixture
• As for any mixtures, original goal: to enrich collection of availabledistributions for modelling data
• Role in many other data-modelling contexts than density, eg survival rates,link functions, etc
• P is almost surely discrete, with random weights (pj)j and locations (θj)j
P =∞∑j=1
pjδθj
• Clustering
8/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process
Species sampling processes
Completely random measuresToy mixture example
9/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Distribution of the weights
Stick-breaking representation of a DP(α,G0)
• P =∑∞
j=1 pjδθj• Weights pj = πj
∏l<j(1− πl) with
πjiid∼ Beta(1, α),
• Expected rate of decay constrained to
Epj =αj−1
(α + 1)j
[Wikipedia]
10/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Clustering mechanism
Predictive probability function, aka Polya Urn or Chinese Restaurant process
• Consider species data X1, . . . ,Xn|Piid∼ P, and P ∼ DP(α,G0)
• Unique values: X ∗1 , . . . ,X∗kn with frequencies n1, . . . , nkn
• Then Xn+1|X1, . . .Xn is either• an existing observation X∗j wp ∝ nj• or a new draw from G0 wp ∝ α
• Gravitational effect valuable for clustering: rich gets richer
P[Xn+1 ∈ · |X1, . . .Xn] =α
α + nG0(.) +
1
α + n
kn∑j=1
njδX∗j
(.)
11/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Beyond the DP from predictive function viewpoint
A discrete random probability measure P can be classified in 3 main categoriesaccording to P[Xn+1 is “new” | X n]
1) P[Xn+1 is “new” | X n] = f (n,model parameters)⇐⇒ depends on n but not on kn and (n1, . . . , nkn )⇐⇒ Dirichlet process (Ferguson, 1973);
2) P[Xn+1 is “new” | X n] = f (n, kn,model parameters)⇐⇒ depends on n and kn but not on (n1, . . . , nkn )⇐⇒ Gibbs-type prior (Pitman, 2003);
3) P[Xn+1 is “new” | X n] = f (n, kn, (n1, . . . , nkn ),model parameters)⇐⇒ depends on n, kn and (n1, . . . , nkn )⇐⇒ tractability issues
12/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Tree of discrete random probability measures
CRM SSM
SBP NRMI SB Gibbs
BP Non-homo. Homo. ‡ Ø 0 ‡ < 0
NGG ‡ = 0 PY
NIG ‡-stable DP PT
CRM SSM
SBP NRMI SB Gibbs
BP Non-homo. Homo. ‡ Ø 0 ‡ < 0
NGG ‡ = 0 PY
NIG ‡-stable DP PTDP
CRM SSM
SBP NRMI SB Gibbs
BP Non-homo. Homo. ‡ Ø 0 ‡ < 0
NGG ‡ = 0 PY
NIG ‡-stable DP PT
SSM
13/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process
Species sampling processes
Completely random measuresToy mixture example
14/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Species sampling models
Species sampling process
Very general form of random probability measures s.t.
P =∞∑j=1
pjδθj +
(1−
∞∑j=1
pj
)G0
where θjiid∼ G0, an atomless probability distribution, and (pj) is an independent
subprobability vector. SSP called proper if∑∞
j=1 pj = 1
CRM SSM
SBP NRMI SB Gibbs
BP Non-homo. Homo. ‡ Ø 0 ‡ < 0
NGG ‡ = 0 PY
NIG ‡-stable DP PT
15/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Species sampling models: example 1
Stick-breaking processes
Special form of the weights, same construction as for the Dirichlet process
pj = πj
∏l<j
(1− πl) with πjind∼ Beta(aj , bj), aj , bj > 0
includes Dirichlet process and Pitman–Yor
16/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Species sampling models: example 2
Gibbs-type processes
Characterized by
• Choice of G0, σ < 1, a set of weights Vn,j which satisfy the recursion
Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1
• A predictive probability function
P[Xn+1 ∈ · | X (n)] =Vn+1,kn+1
Vn,kn︸ ︷︷ ︸P[“new” | X (n)]
G0( · )︸ ︷︷ ︸prior guess
+
(1− Vn+1,kn+1
Vn,kn
)︸ ︷︷ ︸
P[“old” | X (n)]
∑kni=1(Ni − σ) δX∗
i( · )
n − σkn︸ ︷︷ ︸weighted emprirical measure
• Crucially now P[Xn+1 = “new” | X (n)] depends on both the sample size nand the number of distinct values kn
17/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Species sampling models: examples 3 & 4
• Pitman–Yor (PY) processaka Two parameter Poisson–Dirichlet process [Pitman & Yor, 1997],obtained for σ ≥ 0 and θ > −σ or σ < 0 and θ = r |σ| with r ∈ N,
Vn,k =
∏k−1i=1 (θ + iσ)
(θ + 1)n−1which yields
P
[Xn+1 ∈ ·
∣∣∣∣X (n)
]=θ + σknθ + n
P∗( · ) +n − σknθ + n
∑Kni=1 (Ni − σ)δX∗
i( · )
n − σKn.
=⇒ if σ = 0, the PY reduces to the Dirichlet process and θ+Knσθ+n
to θθ+n
• Normalized generalized gamma process (NGG)
Vn,j =eβ σj−1
Γ(n)
n−1∑i=0
(n − 1
i
)(−1)i β i/σ Γ
(j − i
σ; β
)where β > 0, σ ∈ (0, 1) and Γ(a, x) denotes the incomplete gammafunction=⇒ if σ = 1/2 it reduces to the normalized inverse Gaussian process
18/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Number of clusters generated by the PY process
Prior distribution of the number of clusters kn
• θ controls the location (as for the DP)
• σ controls the flatness (or variability)
Example with n = 50, θ = 1 and σ = 0.2, 0.3, . . . , 0.8
00
0.20.21010 0.30.3
0.10.1
0.40.420200.50.5
30300.60.6
0.20.2
4040 0.70.7
0.80.85050
0.30.3
19/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process
Species sampling processes
Completely random measuresToy mixture example
20/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Tree of discrete random probability measures
CRM SSM
SBP NRMI SB Gibbs
BP Non-homo. Homo. ‡ Ø 0 ‡ < 0
NGG ‡ = 0 PY
NIG ‡-stable DP PT
21/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Completely random measures
Definition [Kingman, 1967]
Random measure G s.t. ∀ A1, . . . ,Ad disjoint setsG(A1), . . . ,G(Ad) are mutually independent
• aka Independent Increment Processes or Levyprocesses
• Building block for prior distributions for popularmodels in Biology, Computer science...
• Main advantage: moments are known
(xkcd)
22/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Properties
DiscretenessCRMs are almost surely discrete measures [Kingman,1992]. Under assumption of no fixed atoms
G =∑j≥1
pjδθj
where the jumps (pj)j≥1 and the jump points (θj)j≥1
are independent
Laplace transform
Existence of a measure ν(dv , dy) = ρy (dv)aG0(dy) called the Levy intensityon R+ × Y such that
E[e−
∫Y f (y)P(dy)
]= exp
(−∫R+×Y
[1− exp (−s f (y))] ν(ds, dy)
)
Called homogeneous if ρy = ρ, non homogeneous otherwise
23/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Examples of CRM defined by ρ(v)
Generalized gamma process
by Brix [1999]γ ∈ [0, 1), θ ≥ 0
e−θv
Γ(1− γ)v 1+γ
includes gamma, inverse-Gaussian,stable process
Stable-beta process
by Teh and Gorur [2009]σ ∈ [0, 1), c > −σ
Γ(c + 1)
Γ(1− σ)Γ(c + σ)v−σ−1(1− v)c+σ−1
includes beta process, a stable process
(xkcd)
24/31
Beyond the Dirichlet process Species sampling processes Completely random measures
A brief look at moments
Moments of the (random) mass µ(A)
Defined bymn(A) = E
[µn(A)
]Can be obtained by Faa di Bruno’s formula
mn(A) = E[µn(A)
]=∑(∗)
( nk1···kn)
n∏i=1
(κi (A)/i !
)ki ,over all nonnegative integers (k1, . . . , kn) s.t. k1 + 2k2 + · · ·+ nkn = n wherethe i-th cumulant is given by
κi (A) = aP0(A)
∫ ∞0
v iρ(dv)
25/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Normalized random measure with independent increments
Normalizing CRM: NRMI
A normalized random measure with independent increments (NRMI) is arandom probability measure obtained by normalizing a CRM G with (a.s.)positive and finite total mass G(X)
P( · ) = G( · )/G(X)
• Includes Dirichlet process,normalized generalized gamma,normalized inverse Gaussian,normalized stable...
• Connection between NRMI andSSM? Homogeneous NRMI
• Connection between NRMI andGibbs-type priors? Normalizedgeneralized gamma
CRM SSM
SBP NRMI SB Gibbs
BP Non-homo. Homo. ‡ Ø 0 ‡ < 0
NGG ‡ = 0 PY
NIG ‡-stable DP PT
26/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Table of Contents
Motivations to go beyond the Dirichlet processBayesian nonparametric modelsSome limitations of the Dirichlet process
Species sampling processes
Completely random measuresToy mixture example
27/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Toy mixture example
A popular use of discrete P is within hierarchical mixture models for densityestimation and clustering. The latter is carried out at the latent level and makes useof the discrete nature of P.
• n = 50 observations are drawn from a uniform mixture of two well-separatedGaussian distributions, N(1, 0.2) and N(10, 0.2);
• nonparametric mixture model
(Yi | mi , vi )ind∼ N(mi , vi ), i = 1, . . . , n
(mi , vi | P)iid∼ P i = 1, . . . , n
P ∼ Q
with Q a discrete nonparametric prior.
• The distribution of Kn represents the prior distribution on the number of mixturecomponents; some summary statistics of its posterior distribution of (Kn|Y (n)) isthen used as estimate of the number of mixture components.
28/31
Beyond the Dirichlet process Species sampling processes Completely random measures
• We select priors Q with misspecified parameters: in particular, they are chosen ina way that E[K50] = 25, which corresponds to a prior opinion on K50 remarkablyfar from the true number of components, namely 2.
• 5 different models:
• Dirichlet process with θ = 19.233;• PD processes with (σ, θ) = (0.73001, 1) & (σ, θ) = (0.25, 12.2157);• NGG processes with (σ, β) = (0.7353, 1) & (0.25, 48.4185).
Are the models flexible enough to shift a posteriori towards the correct number ofcomponents?
=⇒ the larger σ the better is the posterior estimate of Kn.=⇒ in terms of density estimation the difference is negligible; this is because one canalways fit a mixture density with more components than needed.
29/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Posterior distribution of the number of mixture components
DP(θ=19.233) NGG(σ , β)=(0.25,48.4185) PY(σ , θ)=(0.25,12.2157) NGG(σ , β)=(0.7353,1) PY(σ , θ)=(0.73001,1)
1 5 9 13 17 21 25
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45 DP(θ=19.233) NGG(σ , β)=(0.25,48.4185) PY(σ , θ)=(0.25,12.2157) NGG(σ , β)=(0.7353,1) PY(σ , θ)=(0.73001,1)
Posterior distributions on the number of groups corresponding to 5 mixturemodels.
30/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Posterior density estimates
DP(e=19.233) NGG(m , `)=(0.25,48.4185) PY(m , e)=(0.25,12.2157) NGG(m , `)=(0.7353,1) PY(m , e)=(0.73001,1) True
-2 0 2 4 6 8 10 12 14
0.1
0.2
0.3
0.4
0.5
0.6DP(e=19.233) NGG(m , `)=(0.25,48.4185) PY(m , e)=(0.25,12.2157) NGG(m , `)=(0.7353,1) PY(m , e)=(0.73001,1) True
Density estimates corresponding to the 5 mixture models.
31/31
Beyond the Dirichlet process Species sampling processes Completely random measures
Opening
Next talks of the workshop
• Matteo Ruggiero: Dependent processes in Bayesian Nonparametrics=⇒ covariate-dependent extensions of the Dirichlet process, aka diffusiveDirichlet mixtures
• Pierpaolo De Blasi: Asymptotics for discrete random measures=⇒ validation of posterior distribution for large n regime, on Dirichletprocess and Pitman–Yor
• Antonio Canale: Applications to Ecology and Marketing=⇒ mainly use of Dirichlet process
• JA: Discovery probabilities=⇒ Species sampling models, Pitman–Yor and normalized generalizedgamma
1/1
References
Thank you for your attention! I
Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes.Advances in Applied Probability, pages 929–953.
Kingman, J. (1967). Completely random measures. Pacific Journal of Mathematics,21(1):59–78.
Kingman, J. F. C. (1992). Poisson processes, volume 3. Oxford university press.
Teh, Y. W. and Gorur, D. (2009). Indian buffet processes with power-law behavior. InBengio, Y., Schuurmans, D., Lafferty, J., Williams, C., and Culotta, A., editors,Advances in Neural Information Processing Systems 22, pages 1838–1846. CurranAssociates, Inc.