A note on density estimation for Poisson mixtures

4
ELSEVIER Statistics & Probability Letters 27 (1996) 255 258 A note on density estimation for Poisson mixtures Ian McKay 1 John Dedman Bldg., Australian National University, ACT 0200, Australia Received January 1995; revised March 1995 Abstract Let X be a random variable that has the distribution of a Poisson mixture with an absolutely continuous mixing distribution. An elementary proof is given that any estimator of the mixing density must converge slower than any power of the sample size. Keywords: Density estimation; Poisson mixtures; Species abundance 1. Introduction Suppose that f:[R ~ R is a density function supported on the positive half-line (0, oo) and let X be a discrete random variable with fo tk Pr{X = k} = e-' f(t)dt, (1) so that X has the distribution of a Poisson mixture. If f is unknown, how well can it be estimated from n independent copies of X? In this note we adapt an argument of Farrell (1972) to show that, on a sufficiently large class of densities, any estimator must converge more slowly than n-~ for any positive :~. Corrresponding results were obtained by Hall and Smith (1988) in the context of stereology, while for deconvolution problems Carroll and Hall (1988) used a slightly different method. Their bounds are sharp, while ours is not. Nevertheless, the proof of our result requires only elementary analysis and clearly demonstrates the loss of information in the available data. Some comments on the form of (1) may be in order. Similar models have been proposed in the study of relative abundance of animal species, and in word frequency analysis. For an infinitesimal sampling of the literature see Corbet et al. (1943), Good (1953), Efron and Thisted (1976) and Ord and Whitmore (1986). (Although f is usually assumed to belong to some parametric family in these applications, Good's paper is a noteworthy exception.) In such problems, a population consisting of many distinct classes is sampled and 1Research partially supported by CONACYT grant no. 1858E9219. 0167-7152/96/$12.00 ~: 1996 Elsevier Science B.V. All rights reserved SSDI 0167-7152(95100073-9

Transcript of A note on density estimation for Poisson mixtures

ELSEVIER Statistics & Probability Letters 27 (1996) 255 258

A note on density estimation for Poisson mixtures

Ian M c K a y 1

John Dedman Bldg., Australian National University, ACT 0200, Australia

Received January 1995; revised March 1995

Abstract

Let X be a random variable that has the distribution of a Poisson mixture with an absolutely continuous mixing distribution. An elementary proof is given that any estimator of the mixing density must converge slower than any power of the sample size.

Keywords: Density estimation; Poisson mixtures; Species abundance

1. Introduction

Suppose that f : [R ~ R is a density function supported on the positive half-line (0, oo) and let X be a discrete random variable with

f o tk Pr{X = k} = e - ' f ( t ) d t , (1)

so that X has the distribution of a Poisson mixture. If f is unknown, how well can it be estimated from n independent copies of X? In this note we adapt an argument of Farrell (1972) to show that, on a sufficiently large class of densities, any estimator must converge more slowly than n-~ for any positive :~.

Corrresponding results were obtained by Hall and Smith (1988) in the context of stereology, while for deconvolution problems Carroll and Hall (1988) used a slightly different method. Their bounds are sharp, while ours is not. Nevertheless, the proof of our result requires only elementary analysis and clearly demonstrates the loss of information in the available data.

Some comments on the form of (1) may be in order. Similar models have been proposed in the study of relative abundance of animal species, and in word frequency analysis. For an infinitesimal sampling of the literature see Corbet et al. (1943), Good (1953), Efron and Thisted (1976) and Ord and Whitmore (1986). (Although f is usually assumed to belong to some parametric family in these applications, Good ' s paper is a noteworthy exception.) In such problems, a population consisting of many distinct classes is sampled and

1 Research partially supported by CONACYT grant no. 1858E9219.

0167-7152/96/$12.00 ~: 1996 Elsevier Science B.V. All rights reserved SSDI 0167-7152(95100073-9

256 L McKav / Statistics & Probabili~' Letters 27 (1996) 255 258

the n u m b e r of individuals in each class is recorded; our model, or a zero- t runcated version, typically arises when sampl ing is a Poisson process and the popula t ion frequencies are considered to be generated from f in the obvious way.

2. A minimax bound

For each B > 0 denote by Ck(B) the set of functions suppor ted in (0, ~ ) and bounded by B, with k cont inuous derivatives also bounded by B.

Let Yo be any point in (0, oo). Suppose that we observe X1 . . . . . Xn and t h a t f ( y o ) = f(Yo; X1 . . . . . X,) is any es t imator of f(Yo).

Theorem 1. I f the sequence a,, n = 1 . . . . satisfies

l iminf inf Pf( l f (Yo) - f ( Y o ) [ < an) = 1, (2) n~oc f~Ck(B)

then for any ~ > 0

n~a, ---> oc. (3)

Proof. Fo r convenience we let h (xly) = e-;'y~/x! denote the Poisson probabi l i ty function. Let fo be the exponent ia l density fo(Y) = 0e-°Y, where 0 is chosen so that f o~ Ck(B/2) ~ Ck(B). The strategy of the p roof is to define a sequence of densities fn such that f,(Yo) ~fo(Yo) slowly, but which are a lmost indist inguishable to the es t imator f(Yo).

Let ~ > 0 and j be a positive integer such that 7 > k/(2k + 2j + 2). Choose a k-times cont inuously differentiable function H(y) with suppor t in [ - 1 , 1 ] with H(0)va 0, and such that ~y~H(y)dy = 0 for i = O, 1, . . . , j -- 1, and c = ~lyJH(y) l dy < .~. Let e, > 0 be a sequence converging to zero and define

f,(Y) =f0(Y) + ~:k,H[(Y -- Yo)/e,},

po(x) = ~h(x ly) fo(y)dy = (1 + 0) -(~+ 11, J

p,(x) = f h(xly)L(y)dy. Then for all n sufficiently large, fn is a probabi l i ty density in Ck(B). Fur thermore , the probabil i t ies po(x) and p.(x) are very close: for each n and each x a Tay lo r expansion of h(xly) in powers of (y - Yo) yields

Ipn(x) - po(x)l = ~ f h(x ly) H{(y yo)/e,) } dy

(ek,/J!)lh(J)(xlY) ]fl(Y -- Yo)JH{(Y - yo)/e,} ] dy

<, 2c]h(J)(x[~)]g~+j+ 1

for some )7 satisfying I f - Yo] ~< e. N o w we set e, = O(n-1/(2k+2~+2)) and show that p, is close to Po in the even s t ronger sense that

E i o [ p ~ ) ) j = 1 + ETo[ po(X) J = 1 + O ( n - (4)

L McKay/Statistics & Probability Letters" 27 (1996) 255-258 257

For any )7 ~ [Yo - e, Yo + ~] and all x/> j we have

1 x - y + i

~< e_(yo_~ ) (Yo - e) x-j (1 + O(x-x)). (x - - j ) !

Noting that fi may vary with x, we conclude that

~ ( 1 + O)l+X(h(J)(xl~)) z < ~. (5) x=O

Therefore,

Efo { (p,(S) - po(S))/po(X)} 2 = O(e, z(k +J+ 1)),

which establishes (4). Applying the Cauchy-Schwartz inequality together with (4), we obtain

{Pf,(I f(Yo) -f , (Yo)l < a,)} 2 ~< PZo(I f(Yo) -f ,(Yo)l < a,)-(1 + O(n 1)),.

By assumption, as n --* oQ the left side approaches 1, from which we obtain

liminfPy0(I f(Yo) -f,(Yo)[ < a,) > 0.

However, also by assumption,

PI0(I f(Yo) -fo(Yo)[ < a,) ~ 1,

and therefore, for sufficiently large n the two events

[I f(Yo) -fo(Yo)l < a,} and {I f(Yo) -f ,(Yo)l < a,}

cannot be disjoint. Then [f,(Yo)-fo(Yo)l < 2 a , . On the other hand, our construction gives If,(Yo)-fo(Yo)[ = ~,k, lH(0)l, from which 2a.nk/2(k+J+l)>~ IH(0)I. The conclusion of the theorem follows immediately. []

3. Discussion

The proof of this result shows that no estimator can have very much power to resolve fine details in the density f This is not surprising because the data available to us can only take integer values. The proof also suggests however, that for densities like the exponential which are very smooth we may be able to do reasonably well. The author has had some success along these lines using a penalized likelihood estimator similar to suggestions of Nychka (1990) and Eggermont and LaRiccia (1995).

References

Carroll, R.J. and P. Hall (1988), Optimal rates of convergence for deconvolving a density, JASA 83, 1184 1186. Corbet, A.S., R.A. Fisher and C.B. Williams (1943), The relationship between the number of species and the number of individuals in

a random sample of an animal population, J. Animal Ecology 12, 42-58.

258 L McKav/Statistics & Probabili O, Letters 27 (1996) 255 258

Efron, B. and R. Thisted (1976), Estimating the number of unseen species: how many words did Shakespeare know? Biometrika 63, 435-447.

Eggermont, P.P.B. and V.N. Lariccia (1995), Maximum smoothed likelihood density estimation for inverse problems, Ann. Statist. 23, 199-220.

Farrell, R.H. (1972), On the best obtainable asymptotic rates of convergence in estimation of a density function at a point, Ann. Math. Statist. 43, 170 180.

Good, I.J. (1953), The population frequencies of species and the estimation of population parameters. Biometrika 40, 237 -264. Hall, P. and R.L. Smith (1988), The kernel method for unfolding sphere size distributions, J. Comput. Phys. 74, 409 421. Nychka, D. (1990), Some properties of adding a smoothing step to the EM algorithm, Statist. Probab. Lett. 9, 187-193. Ord, J.K. and G.A. Whitmore (1986), The Poisson-inverse Gaussian distribution as a model for species abundance, Commun. Statist.

Theor. Methods. 15, 853 871.