Posterior Consistency of Species Sampling Priors · Posterior Consistency of Species Sampling...
Transcript of Posterior Consistency of Species Sampling Priors · Posterior Consistency of Species Sampling...
Posterior Consistency of SpeciesSampling Priors
Jaeyong Lee
Seoul National University
jointly with
Gunho Jang and Sangyeol Lee
Species sampling priors, December 11, 2010 – p. 1/38
Species Sampling
• Imagine that we land on a planet where "no one has gone before".As we explore the planet, we encounter new species unknown tous.
• We record the names of species we encounter. If the species isnew, we name it by picking an element from X .
Species sampling priors, December 11, 2010 – p. 2/38
Species Sampling
• Suppose (X1, X2, . . .) is an infinite sequence of the record.
• Xi : the species of the i th individual sampled.
• X̃j : the jth distinct species appeared
• k = kn : the number of distinct species appeared in (X1, . . . , Xn)
• nj = njn : the number of times the jth species X̃j appears in(X1, . . . , Xn)
• n = (n1n, n2n, . . .) or (n1n, n2n, . . . , nkn)
• n is an element of
N∗ =
∞⋃
k=1
Nk,
where N is the set of positive integers.
Species sampling priors, December 11, 2010 – p. 3/38
Species Sampling Sequence
We call an exchangeable sequence (X1, X2, . . .) the species samplingsequence if
X1 ∼ ν
Xn+1|X1, . . . , Xn ∼
k∑
j=1
pj(nn)δX̃j+ pk+1(nn)ν,
where ν is a diffuse probability measure on X , i.e. ν({x}) = 0 ∀x ∈ X .
Species sampling priors, December 11, 2010 – p. 4/38
Example : Polya Urn Sequence
Suppose P ∼ DP (θν), where θ > 0 and ν is a probability measure andX1, X2, . . . |P ∼ P . Then, marginally X1, X2, . . . is a Polya urnsequence which satisfies
X1 ∼ ν
Xn+1|X1, . . . , Xn ∼
k∑
j=1
njn+ θ
δX̃j+
θ
n+ θν.
Thus, the Polya urn sequence is a species sampling sequence.
Species sampling priors, December 11, 2010 – p. 5/38
Prediction Probability Function
• A sequence of functions (pj , j = 1, 2, . . .) : N∗ → R is called asequence of prediction probability functions if
pj(n) ≥ 0
k(n)+1∑
j=1
pj(n) = 1, for all n ∈ N∗.
• For a species sampling sequence (Xn), the correspondingprediction probability functions is defined as
pj(n) = P(Xn+1 = X̃j |X1, . . . , Xn), j = 1, . . . , kn,
pkn+1(n) = P(Xn+1 /∈ {X1, . . . , Xn}|X1, . . . , Xn).
Species sampling priors, December 11, 2010 – p. 6/38
Example : Pólya Urn Sequence
pj(n1, . . . , nk) =njn+ θ
I(1 ≤ j ≤ k) +θ
n+ θI(j = k + 1),
where n =k
∑
i=1
ni.
Species sampling priors, December 11, 2010 – p. 7/38
Species Sampling Model
• A sequence of random variables (Xn) is a species samplingsequence if and only if X1, X2, . . . |P is random sample from Pwhere
P =∞∑
i=1
PiδX̃i+Rν (1)
for some sequence of positive random variables (Pi) and R suchthat 1−R =
∑∞
i=1 Pi ≤ 1, (X̃i) is a random sample from ν, and
(Pi) and (X̃i) are independent.• We call the directing random probability measure P in equation
(1) the species sampling model (or prior) of the species samplingsequence (Xi).
Species sampling priors, December 11, 2010 – p. 8/38
Exchangeable Partition ProbabilityFunction (EPPF)
• Let [n] = {1, 2, . . . , n}.
• An exchangeable sequence (Xn) defines a random partitionΠn = {A1, . . . , Ak} of [n], where
Ai = {j ∈ [n] : Xj = X̃i}.
Define a function p : N∗ −→ [0, 1]
p(#A1, . . . ,#Ak) = P(Πn = {A1, . . . , Ak}).
• The function p is called the exchangeable partition probabilityfunction (EPPF) derived from the exchangeable sequence (Xn).
Species sampling priors, December 11, 2010 – p. 9/38
• An EPPF p derived from an exchangeable sequence (Xn)satisfies
p(1) = 1
p(n) =
k(n)+1∑
j=1
p(nj+), ∀n ∈ N∗,
where nj+ is the same as n except that jth element is increased
by 1.• Conversely, every symmetric p : N∗ −→ [0, 1] satisfying (2) is an
EPPF of some exchangeable sequence.• An EPPF p and a diffuse probability measure ν uniquely defines
the distribution of a species sampling sequence.
Species sampling priors, December 11, 2010 – p. 10/38
Dirichlet Process (Sethuraman’sRepresentation)
• Define
W1,W2, . . . ∼ i.i.d Beta(1, θ)
X̃1, X̃2, . . . ∼ i.i.d ν
and (Wj) ⊥ (X̃j).
• Construct P1, P2, . . . from Wis by the stick breaking process
P1 = W1
Pj = (1−W1) . . . (1−Wj−1) ·Wj , j = 2, 3, . . . .
Let
P =∞∑
j=1
PjδX̃j.
Then, P ∼ DP (θ, ν).
Species sampling priors, December 11, 2010 – p. 11/38
Pitman-Yor Process• Define
Wj ∼ i.i.d Beta(1− a, b+ ja), j = 1, 2, . . .
X̃1, X̃2, . . . ∼ i.i.d ν
and (Wj) ⊥ (X̃j).
• Construct P1, P2, . . . from Wis by the stick breaking process
P1 = W1
Pj = (1−W1) . . . (1−Wj−1) ·Wj , j = 2, 3, . . . .
Let
P =
∞∑
j=1
PjδX̃j.
Then, P ∼ PY (a, b, ν), where ν is a diffuse probability measure andeither 0 ≤ a < 1 and b > −a or a < 0 and b = −ma for somem = 1, 2, . . .Note PY (0, θ, ν) = DP (θ · ν).
Species sampling priors, December 11, 2010 – p. 12/38
Posterior of PY Process
Suppose
X1, . . . , Xn|P ∼ P,
P ∼ PY (a, b, ν).
Then,
P |X1, . . . , Xn =k∑
j=1
P̃jδX̃j+ R̃kPk,
where (P̃1, . . . , P̃k, R̃k) are independent of Fk and
(P̃1, . . . , P̃k, R̃k) ∼ Dir(n1 − a, . . . , nk − a, b+ ka)
Pk ∼ PY (a, b+ ka, ν).
Species sampling priors, December 11, 2010 – p. 13/38
Some References
• Probability side.Random combinations of Pitman, Gnedin, Kingman, Aldous, Yor,...
• Statistics side.Random combinations of Prünster, Lijoi, Walker, Mena, James,Ishwaran, Müller, Quintana, ...
• And many more and increasing.
Species sampling priors, December 11, 2010 – p. 14/38
The Goal of This Study
• The class of species sampling models is a huge class ofnonparametric priors with more flexibilities than the Dirichletprocess and potentially the same computational ease.
• But, the asymptotic properties with the species sampling modelsare not well understood.
• In the simplest possible nonparametric model, does the speciessampling model pass the test of the posterior consistency?
Species sampling priors, December 11, 2010 – p. 15/38
True Distribution
We assumeX1, X2, . . . ∼ iid P0,
whereP0 =
∑
j
qjδzj + λµ,
where zj ∈ X , q1 ≥ q2 ≥ · · · ≥ 0, λ = 1−∑
j qj ≤ 1 and µ is a diffuseprobability measure.
Let Z = {z1, z2, . . .}.
Species sampling priors, December 11, 2010 – p. 16/38
Model
In this talk, we consider the following model:
X1, . . . , Xn|P ∼ P,
P ∼ P ,
where P is a species sampling prior.
Species sampling priors, December 11, 2010 – p. 17/38
Consistency of PY Process
Theorem 1. When the prior is PY (a, b, ν), the posterior is weakly consistent at P0
if and only if any of the followings holds
(i) a = 0, that is, a Dirichlet process prior,
(ii) when a > 0, P0 is discrete or µ = ν,
(iii) a < 0 and P0 is a mixture of at most m = |b/a| degenerated measures.
Species sampling priors, December 11, 2010 – p. 18/38
Some Remarks
• If P0 is discrete, all the Pitman-Yor process priors with 0 ≤ a < 1entail the consistent posteriors.
• If P0 is continuous, the Dirichlet process is the only prior amongthe Pitman-Yor process priors which renders posteriorconsistency.
• The second part of condition (ii) means that the diffuse probabilitymeasure ν should be proportional to the continuous part µ of thetrue probability measure P0. Thus, in order to get the consistencywe should know the continuous part of the true measure a priori,which is unlikely in practical situations.
• The same result has been obtained by James (2008)independently.
Species sampling priors, December 11, 2010 – p. 19/38
Mixture Models
The story is different in the mixture models. Consider the followingnormal mixture model
Xi|θi, h∼ ind N(θi, h2), i = 1, . . . , n,
θi|P ∼ iid P, i = 1, . . . , n,
P ∼ P ,
h2 ∼ µ,
where P and h are independent a priori.
Under certain conditions, the posterior is weakly (and strongly)consistent.
Species sampling priors, December 11, 2010 – p. 20/38
Lemma 1
Under some general conditions, the followings hold.
(a) The posterior is weakly consistent at P0 if and only if thefollowings hold for every P0-continuity set U of X(i) lim
n→∞E(P (U)|X1, . . . , Xn) = P0(U), P∞
0 − a.s.,
(ii) limn→∞
Var(P (U)|X1, . . . , Xn) = 0, P∞0 − a.s.
where P∞0 is the infinite product of the true probability measure P0
representing the probability measure of (X1, X2, . . .).
(b) If the posterior is weakly consistent at P0, then for all open set Oand closed set C of X(i) lim inf
n→∞E(P (O)|X1, . . . , Xn) ≥ P0(O), P∞
0 − a.s.,
(ii) lim supn→∞
E(P (C)|X1, . . . , Xn) ≤ P0(C), P∞0 − a.s.
Species sampling priors, December 11, 2010 – p. 21/38
Lemma 2
Suppose X1, X2, . . . , Xn are sampled from P0. Let X̃1, . . . , X̃kn be the
distinct values among X1, . . . , Xn, k∗n =∑kn
j=1 I(X̃j /∈ Z) and
Gkn =1
kn
kn∑
j=1
δX̃j.
Then,
(i)knn
→ λ andk∗nn
→ λ, P∞0 − a.s.,
(ii) Gkn → µ, P∞0 − a.s. if λ > 0.
Species sampling priors, December 11, 2010 – p. 22/38
Sketch of Proof
E(P (B)|X1, . . . , Xn)
= E[P(Xn+1 ∈ B|X1, . . . , Xn, P̃1, . . . , P̃kn , R̃kn)|X1, . . . , Xn]
= E[
kn∑
j=1
P̃jI(X̃j ∈ B) + R̃knν(B)|X1, . . . , Xn]
=
kn∑
j=1
njn − a
b+ nI(X̃j ∈ B) +
b+ aknb+ n
ν(B)
=n
b+ nFn(B)−
aknb+ n
Gkn(B) +b+ aknb+ n
ν(B)
→ P0(B)− aλµ(B) + aλν(B)
= P0(B)− aλ(µ(B)− ν(B)),
where Fn = 1n
∑nj=1 δXj
and Gkn is defined earlier.
Species sampling priors, December 11, 2010 – p. 23/38
In summary,
E(P (B)|X1, . . . , Xn) → P0(B)− α · λ (µ(B)− ν(B)) a.s.
Thus,E(P (B)|X1, . . . , Xn) → P0(B) a.s.
if and only if
α = 0
λ = 0
or µ(B) = ν(B).
Species sampling priors, December 11, 2010 – p. 24/38
More Assumptions for GeneralTheorem
• (Smoothness condition for predictive probability function)As n→ ∞,
Sn = Sn(n) = max1≤i≤k
k∑
j=1
∣
∣
∣pj(n)− pj(n
i+)∣
∣
∣→ 0, P∞
0 − a.s.
• (Separability condition for Z, the support of the discrete part ofP0) There exists ǫ > 0 such that for all i 6= j
d(zi, zj) > ǫ,
where d is the metric of X .
Species sampling priors, December 11, 2010 – p. 25/38
General Theorem
Assume the separability condition and the smoothness condition. Theposterior is weakly consistent at P0 if and only if
limn→∞
k∑
j=1
|pj(n)− nj/n|I(X̃j ∈ Z) = 0, P∞0 − a.s. (2)
and one of the followings holds
(i) pk+1(n) → 0 as n→ ∞, P∞0 − a.s.
(ii) P0 is a mixture of a discrete probability measure and the diffusemeasure ν.
Species sampling priors, December 11, 2010 – p. 26/38
Stronger Sufficient Condition
Assume the smoothness condition. The posterior is weakly consistentat P0 if
Cn = Cn(n) =k∑
j=1
∣
∣
∣pj(n)−
njn
∣
∣
∣→ 0, P∞
0 − a.s. as n→ ∞. (3)
Species sampling priors, December 11, 2010 – p. 27/38
Remarks
• Condition (3) says essentially that the conditional distribution ofXn+1 given X1, . . . , Xn behaves like the empirical distribution ofX1, . . . , Xn.
• The smoothness condition for the predictive probability functionpj(n) ensures a small change in n does not change pj(n) much.
• The smoothness condition is satisfied all the examples consideredhere.
• The condition pk+1(n) → 0 as n→ ∞ is natural in the followingsense. Since pk+1(n) is the predictive probability that Xn+1 issampled from ν, we expect that pk+1(n) → 0 as n→ ∞, if theposterior consistency holds.
• Condition (ii) is satisfied by all discrete probability measures.Thus, all species sampling priors satisfying (2) are weaklyconsistent at every discrete probability measure.
Species sampling priors, December 11, 2010 – p. 28/38
Pitman-Yor Process
Suppose P ∼ PY (a, b, ν) with 0 ≤ a < 1 and b > 0. Since pj(n) =nj−ab+n
for j = 1, . . . , k, and pk+1(n) = (b+ ka)/(b+ n),
∑
j:X̃j∈Z
∣
∣
∣pj(n)−
njn
∣
∣
∣=
∑
j:X̃j∈Z
∣
∣
∣
nj − a
b+ n−njn
∣
∣
∣=an(k − k∗) + b(n− k∗)
n(b+ n)→ 0, a.s.
Note pk+1(n) → aλ, P∞0 − a.s. Condition (i) is equivalent to a = 0 or
λ = 0. Thus, the general theorem agrees with the theorem for thePitman-Yor process.
Species sampling priors, December 11, 2010 – p. 29/38
Normalized Inverse-Gaussian (N-IG) Process
Lijoi, Mena and Prünster (2005) defined the N-IG process P byspecifying the distribution of (P (B1), . . . , P (Bk)) for a partitionB1, . . . , Bk of X as the distribution of
(V1, . . . , Vk)/V,
where V = V1 + · · ·+ Vk and
Viind∼ IG(θν(Bi), 1), i = 1, . . . , k.
Here IG(a, b) denotes the inverse-Gaussian distribution withparameter a ≥ 0 and b > 0 whose density is
a(2πx3)−1/2 exp(−(a2/x+ b2x)/2 + ab), x > 0.
Species sampling priors, December 11, 2010 – p. 30/38
One can show the N-IG process is the species sampling prior withpredictive distribution
P(Xn+1 ∈ B|X1, . . . , Xn) = w1,n
k∑
j=1
(nj − 1/2)δX̃j(B) + w0,nν(B),
where
w0,n =θ∫∞
1(1− y−2)nyke−θydy
2n∫∞
1(1− y−2)n−1yk−1e−θydy
w1,n =
∫∞
1(1− y−2)nyk−1e−θydy
n∫∞
1(1− y−2)n−1yk−1e−θydy
.
Species sampling priors, December 11, 2010 – p. 31/38
Consistency of N-IG Process
• The N-IG process prior is consistent at all the discretedistributions, but inconsistent at all the continuous distributionsexcept ν.
Species sampling priors, December 11, 2010 – p. 32/38
Poisson-Kingman PartitionPK(ρ)
• Let Λ with density ρ is the intensity measure of the Poissonprocess with
∫ 1
0
xdΛ(x) <∞ and∫ ∞
1
dΛ(x) <∞.
• Let J1, J2, . . . be the jump sizes of the Poisson point process withthe intensity Λ.
• The normalized Jis, Ji/T , play the role of Pis in the speciessampling prior.
Species sampling priors, December 11, 2010 – p. 33/38
• The EPPF of PK(ρ) is given by
p(n1, . . . , nk) =(−1)n−k
Γ(n)
∫ ∞
0
un−1e−ψ(u)k∏
j=1
ψnj(u)du
where ψ(u) =∫∞
0(1− e−ux)ρ(x)dx and
ψm(u) = dmψdum (u) = (−1)m−1
∫∞
0xme−uxρ(x)dx for m = 1, 2, . . ..
• The predicted probability function pj(n) of PK(ρ) is, forj = 1, . . . , k,
pj(n) = −1
n
∫∞
0uψnj+1(u)
ψnj(u) u
n−1e−ψ(u)∏ki=1 ψni
(u)du∫∞
0un−1e−ψ(u)
∏ki=1 ψni
(u)du.
Species sampling priors, December 11, 2010 – p. 34/38
Consistency ofPK(ρ)
• In this example, we consider
ρa,b,c(x) = cx−a−1e−bx,
where 0 ≤ a < 1, b ≥ 0 and c > 0.• DP (θν) is equivalent to PK(ρ0,1,θ, ν).
• PK(ρa,b,c) is consistent at all discrete distributions, butinconsistent at all continuous distributions except a = 0.
Species sampling priors, December 11, 2010 – p. 35/38
Gibbs Partition
• An EPPF p is said to be of Gibbs form if
p(n1, . . . , nk) = Vn,k
k∏
j=1
Wnj,
for some nonnegative weights W = (Wj) and V = (Vn,k).
• Assume W1 = V1,1 = 1. Then, every Gibbs partition isrepresented by Wjs and Vn,ks satisfying
Wj =
{
1 if j = 1,∏j−2i=0 (b− a+ bi) j = 2, 3, . . .
and Vn,k = (bn−ak)Vn+1,k+Vn+1,k+1
for some b > 0 and a < b.
Species sampling priors, December 11, 2010 – p. 36/38
• The predictive probability functions are, for j = 1, . . . , k,
pj(n) =p(nj+)
p(n)=Vn+1,k
Vn,k
Wnj+1
Wnj
=nbVn+1,k
Vn,k
nj − a/b
n.
Species sampling priors, December 11, 2010 – p. 37/38
Consistency of Gibbs Form
• The species sampling prior generated by a Gibbs partition isconsistent at all discrete probability measures if
nbVn+1,k/Vn,k → 1.
• Under the assumption nbVn+1,k/Vn,k → 1, the posterior isconsistent at all continuous distributions if and only if a = 0 or theDirichlet process.
Species sampling priors, December 11, 2010 – p. 38/38