Posterior Consistency of Species Sampling Priors · Posterior Consistency of Species Sampling...

Posterior Consistency of SpeciesSampling Priors

Jaeyong Lee

[email protected]

Seoul National University

jointly with

Gunho Jang and Sangyeol Lee

Species sampling priors, December 11, 2010 – p. 1/38

Species Sampling

• Imagine that we land on a planet where "no one has gone before".As we explore the planet, we encounter new species unknown tous.

• We record the names of species we encounter. If the species isnew, we name it by picking an element from X .


Species Sampling

• Suppose (X1, X2, . . .) is an infinite sequence of the record.

• Xi : the species of the i th individual sampled.

• X̃j : the jth distinct species appeared

• k = kn : the number of distinct species appeared in (X1, . . . , Xn)

• nj = njn : the number of times the jth species X̃j appears in(X1, . . . , Xn)

• n = (n1n, n2n, . . .) or (n1n, n2n, . . . , nkn)

• n is an element of

N∗ =

∞⋃

k=1

Nk,

where N is the set of positive integers.


Species Sampling Sequence

We call an exchangeable sequence (X1, X2, . . .) the species samplingsequence if

X1 ∼ ν

Xn+1|X1, . . . , Xn ∼

k∑

j=1

pj(nn)δX̃j+ pk+1(nn)ν,

where ν is a diffuse probability measure on X , i.e. ν({x}) = 0 ∀x ∈ X .


Example : Polya Urn Sequence

Suppose P ∼ DP (θν), where θ > 0 and ν is a probability measure andX1, X2, . . . |P ∼ P . Then, marginally X1, X2, . . . is a Polya urnsequence which satisfies

X1 ∼ ν

Xn+1|X1, . . . , Xn ∼

k∑

j=1

njn+ θ

δX̃j+

θ

n+ θν.

Thus, the Polya urn sequence is a species sampling sequence.


Prediction Probability Function

• A sequence of functions (pj , j = 1, 2, . . .) : N∗ → R is called asequence of prediction probability functions if

pj(n) ≥ 0

k(n)+1∑

j=1

pj(n) = 1, for all n ∈ N∗.

• For a species sampling sequence (Xn), the correspondingprediction probability functions is defined as

pj(n) = P(Xn+1 = X̃j |X1, . . . , Xn), j = 1, . . . , kn,

pkn+1(n) = P(Xn+1 /∈ {X1, . . . , Xn}|X1, . . . , Xn).


Example : Pólya Urn Sequence

pj(n1, . . . , nk) =njn+ θ

I(1 ≤ j ≤ k) +θ

n+ θI(j = k + 1),

where n =k

∑

i=1

ni.


Species Sampling Model

• A sequence of random variables (Xn) is a species samplingsequence if and only if X1, X2, . . . |P is random sample from Pwhere

P =∞∑

i=1

PiδX̃i+Rν (1)

for some sequence of positive random variables (Pi) and R suchthat 1−R =

∑∞

i=1 Pi ≤ 1, (X̃i) is a random sample from ν, and

(Pi) and (X̃i) are independent.• We call the directing random probability measure P in equation

(1) the species sampling model (or prior) of the species samplingsequence (Xi).


Exchangeable Partition ProbabilityFunction (EPPF)

• Let [n] = {1, 2, . . . , n}.

• An exchangeable sequence (Xn) defines a random partitionΠn = {A1, . . . , Ak} of [n], where

Ai = {j ∈ [n] : Xj = X̃i}.

Define a function p : N∗ −→ [0, 1]

p(#A1, . . . ,#Ak) = P(Πn = {A1, . . . , Ak}).

• The function p is called the exchangeable partition probabilityfunction (EPPF) derived from the exchangeable sequence (Xn).


• An EPPF p derived from an exchangeable sequence (Xn)satisfies

p(1) = 1

p(n) =

k(n)+1∑

j=1

p(nj+), ∀n ∈ N∗,

where nj+ is the same as n except that jth element is increased

by 1.• Conversely, every symmetric p : N∗ −→ [0, 1] satisfying (2) is an

EPPF of some exchangeable sequence.• An EPPF p and a diffuse probability measure ν uniquely defines

the distribution of a species sampling sequence.


Dirichlet Process (Sethuraman’sRepresentation)

• Define

W1,W2, . . . ∼ i.i.d Beta(1, θ)

X̃1, X̃2, . . . ∼ i.i.d ν

and (Wj) ⊥ (X̃j).

• Construct P1, P2, . . . from Wis by the stick breaking process

P1 = W1

Pj = (1−W1) . . . (1−Wj−1) ·Wj , j = 2, 3, . . . .

Let

P =∞∑

j=1

PjδX̃j.

Then, P ∼ DP (θ, ν).


Pitman-Yor Process• Define

Wj ∼ i.i.d Beta(1− a, b+ ja), j = 1, 2, . . .

X̃1, X̃2, . . . ∼ i.i.d ν

and (Wj) ⊥ (X̃j).

• Construct P1, P2, . . . from Wis by the stick breaking process

P1 = W1

Pj = (1−W1) . . . (1−Wj−1) ·Wj , j = 2, 3, . . . .

Let

P =

∞∑

j=1

PjδX̃j.

Then, P ∼ PY (a, b, ν), where ν is a diffuse probability measure andeither 0 ≤ a < 1 and b > −a or a < 0 and b = −ma for somem = 1, 2, . . .Note PY (0, θ, ν) = DP (θ · ν).


Posterior of PY Process

Suppose

X1, . . . , Xn|P ∼ P,

P ∼ PY (a, b, ν).

Then,

P |X1, . . . , Xn =k∑

j=1

P̃jδX̃j+ R̃kPk,

where (P̃1, . . . , P̃k, R̃k) are independent of Fk and

(P̃1, . . . , P̃k, R̃k) ∼ Dir(n1 − a, . . . , nk − a, b+ ka)

Pk ∼ PY (a, b+ ka, ν).


Some References

• Probability side.Random combinations of Pitman, Gnedin, Kingman, Aldous, Yor,...

• Statistics side.Random combinations of Prünster, Lijoi, Walker, Mena, James,Ishwaran, Müller, Quintana, ...

• And many more and increasing.


The Goal of This Study

• The class of species sampling models is a huge class ofnonparametric priors with more flexibilities than the Dirichletprocess and potentially the same computational ease.

• But, the asymptotic properties with the species sampling modelsare not well understood.

• In the simplest possible nonparametric model, does the speciessampling model pass the test of the posterior consistency?


True Distribution

We assumeX1, X2, . . . ∼ iid P0,

whereP0 =

∑

j

qjδzj + λµ,

where zj ∈ X , q1 ≥ q2 ≥ · · · ≥ 0, λ = 1−∑

j qj ≤ 1 and µ is a diffuseprobability measure.

Let Z = {z1, z2, . . .}.


Model

In this talk, we consider the following model:

X1, . . . , Xn|P ∼ P,

P ∼ P ,

where P is a species sampling prior.


Consistency of PY Process

Theorem 1. When the prior is PY (a, b, ν), the posterior is weakly consistent at P0

if and only if any of the followings holds

(i) a = 0, that is, a Dirichlet process prior,

(ii) when a > 0, P0 is discrete or µ = ν,

(iii) a < 0 and P0 is a mixture of at most m = |b/a| degenerated measures.


Some Remarks

• If P0 is discrete, all the Pitman-Yor process priors with 0 ≤ a < 1entail the consistent posteriors.

• If P0 is continuous, the Dirichlet process is the only prior amongthe Pitman-Yor process priors which renders posteriorconsistency.

• The second part of condition (ii) means that the diffuse probabilitymeasure ν should be proportional to the continuous part µ of thetrue probability measure P0. Thus, in order to get the consistencywe should know the continuous part of the true measure a priori,which is unlikely in practical situations.

• The same result has been obtained by James (2008)independently.


Mixture Models

The story is different in the mixture models. Consider the followingnormal mixture model

Xi|θi, h∼ ind N(θi, h2), i = 1, . . . , n,

θi|P ∼ iid P, i = 1, . . . , n,

P ∼ P ,

h2 ∼ µ,

where P and h are independent a priori.

Under certain conditions, the posterior is weakly (and strongly)consistent.


Lemma 1

Under some general conditions, the followings hold.

(a) The posterior is weakly consistent at P0 if and only if thefollowings hold for every P0-continuity set U of X(i) lim

n→∞E(P (U)|X1, . . . , Xn) = P0(U), P∞

0 − a.s.,

(ii) limn→∞

Var(P (U)|X1, . . . , Xn) = 0, P∞0 − a.s.

where P∞0 is the infinite product of the true probability measure P0

representing the probability measure of (X1, X2, . . .).

(b) If the posterior is weakly consistent at P0, then for all open set Oand closed set C of X(i) lim inf

n→∞E(P (O)|X1, . . . , Xn) ≥ P0(O), P∞

0 − a.s.,

(ii) lim supn→∞

E(P (C)|X1, . . . , Xn) ≤ P0(C), P∞0 − a.s.


Lemma 2

Suppose X1, X2, . . . , Xn are sampled from P0. Let X̃1, . . . , X̃kn be the

distinct values among X1, . . . , Xn, k∗n =∑kn

j=1 I(X̃j /∈ Z) and

Gkn =1

kn

kn∑

j=1

δX̃j.

Then,

(i)knn

→ λ andk∗nn

→ λ, P∞0 − a.s.,

(ii) Gkn → µ, P∞0 − a.s. if λ > 0.


Sketch of Proof

E(P (B)|X1, . . . , Xn)

= E[P(Xn+1 ∈ B|X1, . . . , Xn, P̃1, . . . , P̃kn , R̃kn)|X1, . . . , Xn]

= E[

kn∑

j=1

P̃jI(X̃j ∈ B) + R̃knν(B)|X1, . . . , Xn]

=

kn∑

j=1

njn − a

b+ nI(X̃j ∈ B) +

b+ aknb+ n

ν(B)

=n

b+ nFn(B)−

aknb+ n

Gkn(B) +b+ aknb+ n

ν(B)

→ P0(B)− aλµ(B) + aλν(B)

= P0(B)− aλ(µ(B)− ν(B)),

where Fn = 1n

∑nj=1 δXj

and Gkn is defined earlier.


In summary,

E(P (B)|X1, . . . , Xn) → P0(B)− α · λ (µ(B)− ν(B)) a.s.

Thus,E(P (B)|X1, . . . , Xn) → P0(B) a.s.

if and only if

α = 0

λ = 0

or µ(B) = ν(B).


More Assumptions for GeneralTheorem

• (Smoothness condition for predictive probability function)As n→ ∞,

Sn = Sn(n) = max1≤i≤k

k∑

j=1

∣

∣

∣pj(n)− pj(n

i+)∣

∣

∣→ 0, P∞

0 − a.s.

• (Separability condition for Z, the support of the discrete part ofP0) There exists ǫ > 0 such that for all i 6= j

d(zi, zj) > ǫ,

where d is the metric of X .


General Theorem

Assume the separability condition and the smoothness condition. Theposterior is weakly consistent at P0 if and only if

limn→∞

k∑

j=1

|pj(n)− nj/n|I(X̃j ∈ Z) = 0, P∞0 − a.s. (2)

and one of the followings holds

(i) pk+1(n) → 0 as n→ ∞, P∞0 − a.s.

(ii) P0 is a mixture of a discrete probability measure and the diffusemeasure ν.


Stronger Sufficient Condition

Assume the smoothness condition. The posterior is weakly consistentat P0 if

Cn = Cn(n) =k∑

j=1

∣

∣

∣pj(n)−

njn

∣

∣

∣→ 0, P∞

0 − a.s. as n→ ∞. (3)


Remarks

• Condition (3) says essentially that the conditional distribution ofXn+1 given X1, . . . , Xn behaves like the empirical distribution ofX1, . . . , Xn.

• The smoothness condition for the predictive probability functionpj(n) ensures a small change in n does not change pj(n) much.

• The smoothness condition is satisfied all the examples consideredhere.

• The condition pk+1(n) → 0 as n→ ∞ is natural in the followingsense. Since pk+1(n) is the predictive probability that Xn+1 issampled from ν, we expect that pk+1(n) → 0 as n→ ∞, if theposterior consistency holds.

• Condition (ii) is satisfied by all discrete probability measures.Thus, all species sampling priors satisfying (2) are weaklyconsistent at every discrete probability measure.


Pitman-Yor Process

Suppose P ∼ PY (a, b, ν) with 0 ≤ a < 1 and b > 0. Since pj(n) =nj−ab+n

for j = 1, . . . , k, and pk+1(n) = (b+ ka)/(b+ n),

∑

j:X̃j∈Z

∣

∣

∣pj(n)−

njn

∣

∣

∣=

∑

j:X̃j∈Z

∣

∣

∣

nj − a

b+ n−njn

∣

∣

∣=an(k − k∗) + b(n− k∗)

n(b+ n)→ 0, a.s.

Note pk+1(n) → aλ, P∞0 − a.s. Condition (i) is equivalent to a = 0 or

λ = 0. Thus, the general theorem agrees with the theorem for thePitman-Yor process.


Normalized Inverse-Gaussian (N-IG) Process

Lijoi, Mena and Prünster (2005) defined the N-IG process P byspecifying the distribution of (P (B1), . . . , P (Bk)) for a partitionB1, . . . , Bk of X as the distribution of

(V1, . . . , Vk)/V,

where V = V1 + · · ·+ Vk and

Viind∼ IG(θν(Bi), 1), i = 1, . . . , k.

Here IG(a, b) denotes the inverse-Gaussian distribution withparameter a ≥ 0 and b > 0 whose density is

a(2πx3)−1/2 exp(−(a2/x+ b2x)/2 + ab), x > 0.


One can show the N-IG process is the species sampling prior withpredictive distribution

P(Xn+1 ∈ B|X1, . . . , Xn) = w1,n

k∑

j=1

(nj − 1/2)δX̃j(B) + w0,nν(B),

where

w0,n =θ∫∞

1(1− y−2)nyke−θydy

2n∫∞

1(1− y−2)n−1yk−1e−θydy

w1,n =

∫∞

1(1− y−2)nyk−1e−θydy

n∫∞

1(1− y−2)n−1yk−1e−θydy

.


Consistency of N-IG Process

• The N-IG process prior is consistent at all the discretedistributions, but inconsistent at all the continuous distributionsexcept ν.


Poisson-Kingman PartitionPK(ρ)

• Let Λ with density ρ is the intensity measure of the Poissonprocess with

∫ 1

0

xdΛ(x) <∞ and∫ ∞

1

dΛ(x) <∞.

• Let J1, J2, . . . be the jump sizes of the Poisson point process withthe intensity Λ.

• The normalized Jis, Ji/T , play the role of Pis in the speciessampling prior.


• The EPPF of PK(ρ) is given by

p(n1, . . . , nk) =(−1)n−k

Γ(n)

∫ ∞

0

un−1e−ψ(u)k∏

j=1

ψnj(u)du

where ψ(u) =∫∞

0(1− e−ux)ρ(x)dx and

ψm(u) = dmψdum (u) = (−1)m−1

∫∞

0xme−uxρ(x)dx for m = 1, 2, . . ..

• The predicted probability function pj(n) of PK(ρ) is, forj = 1, . . . , k,

pj(n) = −1

n

∫∞

0uψnj+1(u)

ψnj(u) u

n−1e−ψ(u)∏ki=1 ψni

(u)du∫∞

0un−1e−ψ(u)

∏ki=1 ψni

(u)du.


Consistency ofPK(ρ)

• In this example, we consider

ρa,b,c(x) = cx−a−1e−bx,

where 0 ≤ a < 1, b ≥ 0 and c > 0.• DP (θν) is equivalent to PK(ρ0,1,θ, ν).

• PK(ρa,b,c) is consistent at all discrete distributions, butinconsistent at all continuous distributions except a = 0.


Gibbs Partition

• An EPPF p is said to be of Gibbs form if

p(n1, . . . , nk) = Vn,k

k∏

j=1

Wnj,

for some nonnegative weights W = (Wj) and V = (Vn,k).

• Assume W1 = V1,1 = 1. Then, every Gibbs partition isrepresented by Wjs and Vn,ks satisfying

Wj =

{

1 if j = 1,∏j−2i=0 (b− a+ bi) j = 2, 3, . . .

and Vn,k = (bn−ak)Vn+1,k+Vn+1,k+1

for some b > 0 and a < b.


• The predictive probability functions are, for j = 1, . . . , k,

pj(n) =p(nj+)

p(n)=Vn+1,k

Vn,k

Wnj+1

Wnj

=nbVn+1,k

Vn,k

nj − a/b

n.


Consistency of Gibbs Form

• The species sampling prior generated by a Gibbs partition isconsistent at all discrete probability measures if

nbVn+1,k/Vn,k → 1.

• Under the assumption nbVn+1,k/Vn,k → 1, the posterior isconsistent at all continuous distributions if and only if a = 0 or theDirichlet process.


Posterior Consistency of Species Sampling Priors · Posterior Consistency of Species Sampling...

Documents

Transcript of Posterior Consistency of Species Sampling Priors · Posterior Consistency of Species Sampling...