Cramér-Rao bounds for posterior variances

6
Statistics & Probability Letters 17 (1993) 173-178 North-Holland 18 June 1993 Cram&-Rao bounds for posterior variances Malay Ghosh Unhersity of Florida, Gaines&e, FL, USA Received September 1992 Revised October 1992 Abstract: The paper obtains Cramer-Rao type bounds for posterior variances. Conditions are provided for attainability of these bounds. The present bounds are usually more widely applicable, and are often sharper than the ones reported in Schutzenberger (19.57). Also, our results extend immediately to find lower bounds of posterior risks, and subsequently of Bayes risks of parameters of interest. Such results can be contrasted to the findings of Cart (1959) who obtained lower bounds for Bayes risks of estimators from which posterior risks are not easy to retrieve. We have provided also multiparameter generalization of our results. Finally, certain Bhattacharyya type bounds for posterior variances are also obtained. In contrast to the usual technique of differentiation under integration for finding Cramer-Rao bounds, our method of proof involves integration by parts. Keywords: Cramer-Rao; bounds; posterior variances; posterior risks; Bayes risks; nuisance parameters; Bhattacharyya type bounds. 1. Introduction Cram&-Rao (C-R) lower bounds for variances of estimators as well as many of its variants are available in the statistics literature for nearly five decades. Over the years, substantial research has undergone in this area, and the original inequality has been generalized in many directions both in the fixed sample, and in the sequential framework. However, most of the generalizations of this celebrated inequality have taken place within the frequentist set up. Bayesian analogs of the C-R inequality are more rare to come across. Schutzenberger (1957) reported such an inequality for posterior variances. Gart (1959) obtained lower bounds for Bayes risks of estimators. More recently, refined versions of Gart’s inequality have been obtained by Borovkov and Sakhanienko (1980) and Brown and Gajek (1990). The present note finds a C-R lower bound for posterior variances different from the one given in Schutzenberger (1957). While Schutzenberger’s result requires differentiability of the estimate 6(x) of a parameter, say ~(0) with respect to x, our result does not require such a differentiability condition. In consequence, Schutzenberger’s result does not apply to discrete distributions including some important ones like the binomial, Poisson, and negative binomial, but our results find applications in such cases as well. We show also that for the one-parameter exponential family with conjugate priors, the C-R lower bound for posterior variances is attained if and only if the parameter of interest is a linear function of the population mean. On the other hand, with the exception of the normal mean, even when the distribution belongs to the continuous one-parameter exponential family, and the prior is conjugate, Schutzenberger’s Correspondence to: Malay Ghosh, Department of Statistics, University of Florida, Gainesville, FL 32611, USA. Research partially supported by NSF Grant Number SES-9201210. 0167.7152/93/$06.00 0 1993 - El sevier Science Publishers B.V. All rights reserved 173

Transcript of Cramér-Rao bounds for posterior variances

Page 1: Cramér-Rao bounds for posterior variances

Statistics & Probability Letters 17 (1993) 173-178

North-Holland

18 June 1993

Cram&-Rao bounds for posterior variances

Malay Ghosh Unhersity of Florida, Gaines&e, FL, USA

Received September 1992

Revised October 1992

Abstract: The paper obtains Cramer-Rao type bounds for posterior variances. Conditions are provided for attainability of these

bounds. The present bounds are usually more widely applicable, and are often sharper than the ones reported in Schutzenberger

(19.57). Also, our results extend immediately to find lower bounds of posterior risks, and subsequently of Bayes risks of parameters

of interest. Such results can be contrasted to the findings of Cart (1959) who obtained lower bounds for Bayes risks of estimators

from which posterior risks are not easy to retrieve. We have provided also multiparameter generalization of our results. Finally,

certain Bhattacharyya type bounds for posterior variances are also obtained. In contrast to the usual technique of differentiation

under integration for finding Cramer-Rao bounds, our method of proof involves integration by parts.

Keywords: Cramer-Rao; bounds; posterior variances; posterior risks; Bayes risks; nuisance parameters; Bhattacharyya type bounds.

1. Introduction

Cram&-Rao (C-R) lower bounds for variances of estimators as well as many of its variants are available in the statistics literature for nearly five decades. Over the years, substantial research has undergone in this area, and the original inequality has been generalized in many directions both in the fixed sample, and in the sequential framework. However, most of the generalizations of this celebrated inequality have taken place within the frequentist set up.

Bayesian analogs of the C-R inequality are more rare to come across. Schutzenberger (1957) reported such an inequality for posterior variances. Gart (1959) obtained lower bounds for Bayes risks of estimators. More recently, refined versions of Gart’s inequality have been obtained by Borovkov and Sakhanienko (1980) and Brown and Gajek (1990).

The present note finds a C-R lower bound for posterior variances different from the one given in Schutzenberger (1957). While Schutzenberger’s result requires differentiability of the estimate 6(x) of a parameter, say ~(0) with respect to x, our result does not require such a differentiability condition. In consequence, Schutzenberger’s result does not apply to discrete distributions including some important ones like the binomial, Poisson, and negative binomial, but our results find applications in such cases as well.

We show also that for the one-parameter exponential family with conjugate priors, the C-R lower bound for posterior variances is attained if and only if the parameter of interest is a linear function of the population mean. On the other hand, with the exception of the normal mean, even when the distribution belongs to the continuous one-parameter exponential family, and the prior is conjugate, Schutzenberger’s

Correspondence to: Malay Ghosh, Department of Statistics, University of Florida, Gainesville, FL 32611, USA.

Research partially supported by NSF Grant Number SES-9201210.

0167.7152/93/$06.00 0 1993 - El sevier Science Publishers B.V. All rights reserved 173

Page 2: Cramér-Rao bounds for posterior variances

Volume 17, Number 3 STATISTICS&PROBABILITY LETTERS 18June1993

bounds for posterior variances are not attained for estimating the population mean. Thus, in comparable circumstances, our bounds are often sharper than the ones of Schutzenberger.

We shall find also in Section 2 that from the C-R lower bounds for posterior variances, it is immediate to obtain lower bounds for posterior risks of arbitrary estimators of the parameters of interest. This also leads to lower bounds of Bayes risks of such estimators. This is in contrast to direct lower bounds of Bayes risks as obtained by Gart (1959) from which lower bounds of posterior risks are not easy to retrieve.

The outline of the remaining sections is as follows. In Section 2, the C-R lower bounds for posterior variances are derived. Conditions under which such bounds are attained are also discussed. Also, the present bounds are compared and contrasted to those of Schuntezberger (1957) and Gart (1959).

In Section 3, we have provided C-R lower bounds for posterior variances when there are several parameters. Once again, conditions under which equality is attained, is discussed. As a special case, C-R lower bounds for posterior variances of the parameters of interest in the presence of nuisance parameters are provided.

Finally, in Section 4, Bhattacharyya type lower bound for posterior variances are obtained, and conditions for attainment of such bounds are also discussed.

2. Bounds for posterior variances

Let X be a random variable (real- or vector-valued) having a distribution absolutely continuous with respect to some u-finite measure p with p.d.f. fJx) where 13 E (a, b), some open interval in the real line. It is possible that a = -w, b = + 00 or both. Consider a prior with a distribution absolutely continuous with respect to Lebesgue measure having p.d.f. ~(0). Denote by ~(8 I x> the posterior p.d.f. of 19 given X = x. It is assumed that

(A) %-(elnc) +o as e-a and 8-b;

(B) +I log T(e l qae)” l X] > 0.

The following theorem provides lower bounds for posterior variances of parametric functions -y(e).

Theorem 1. Assume (A) and (B). Suppose y(e) is a parametric function differentiable in 0 which satisfies

(C) E[ y*(e) I -x] < CQ;

(D) q I y’(e) I I -q < m;

(E) y(e)r(elx) -0 as e+aandb.

Then,

v(y(e)lx> B [qf(e)lx)]*/z+ log +w/~e)*i~]. (1)

Proof. If E[(a log de I d/aeP I xl = + ~0, then the result is trivially is true. So, assume that ~[(a log de ~~>/ai)~ I XI <’ W. NOW, integrating by parts, and using 0% one gets

I x)/aq+ E[Y’W

Next, using (A),

E[@ log

1 x] = lbyye)7;(e I X) de = - jby(e)(a log 77-p a a

T(e 1 x)/se) I X] = lb(h(e I x)pq de = 0. a

Ix) de. (2)

(3)

174

Page 3: Cramér-Rao bounds for posterior variances

Volume 17, Number 3 STATISTICS&PROBABILITY LETTERS 18June1993

Hence, from (2) and (3),

E[y’(B)lx] = -Cov(y(B),a log a(elx)/aelx).

Squaring both sides of (4), using the Schwarz inequality, and (3), one gets (1). 0

(4)

Remark 1. If one assumes that

(F) (a log ~(6’lx)/a0)~(/3lx) -+O as f3-+a and 0-b;

then integration by parts gives

E[(a i0g a(elx)/ae)21x] =_!+a2 i0g ~(elx)/ae~)lx]. (5)

Thus, under the additional assumption (F), the denominator of the C-R lower bound for posterior variances can be replaced by the right hand side of (5). This is comparable to the standard frequentist result that ,!?[(a log fJx)/?@)’ I 01 = E[( - a2 log fe(x>/aO*) I 01 which holds under certain regularity conditions.

Remark 2. It is clear from the proof of Theorem 1 that equality holds in (1) if and only if

r(e) = c(x)(a log +I x)/30) + d(x), (6)

where c(x) and d(x) are functions of x not depending on 0. In the special case of a regular exponential

family, f&x) a exp(0.x - qb(O)), so that

a i0g T(eIx)/ae=~-q(e) +d(ep(e).

Hence, C-R lower bounds for posterior variances are attained for functions r(0) which are linear in r’(@)/r(@) - t,k’(e). More generally, C-R lower bounds for posterior variances are attained for functions h(e; x) = k(x)[r’(t?)/x(0) - G’(e)] + m(x). In particular, if r(B) is the conjugate prior given by r(8) a exp(ae- t$(e)), then d(e)/de)= a - z+'(e). In this case only posterior variances of functions of the form c(x)@‘(B) + d(x) attain the C-R lower bound.

Remark 3. Schutzenberger’s (see also Noorbaloochi and Meeden, 1983) result on the other hand proceeds as follows. If 6(x) is the posterior mean of r(0), then 6(x) = /r(0>7(0 I x> de. If 6(x) and &8 Ix) are both differentiable in x, then assuming the validity of change in differentiation and integration, one gets

syx) = lr(e)(a i0g T(e I x)/ax) de = cov(r(e), a log 5-+ I x)/ax 14. (7)

Now an application of the Schwarz inequality gives

[SIX)]” G v(y(q I xp[(a log 7ql xpq2 I x]

which leads to the inequality

I+(e) I X> 2 (qq2&a log +9 .qmg2 I x]. (8)

We shall now compare the inequalities given in (1) and (8) for the special case of a regular one parameter exponential family. In this case, since de I x> a exp[(cw +x)e - (V + i)+(e)], one gets

175

Page 4: Cramér-Rao bounds for posterior variances

Volume 17, Number 3 STATISTICS&PROBABILITY LETTERS 18 June 1993

a log ~(0 I X)/W =x + a - (Y + 1)$‘(e). Using E[a log rr(8 I x>/M I xl = 0, and -!?[(a log ~(0 I x)/W)* I X] = E[ - a2 log 740 I x>/ae* I xl, one gets

s(x)=E[~(B)~x] =(~+g/(~+i); (9)

(v+~)2v[+ye)l~] =E[(aioga(eIx)/ae)*I~] =(~+i>E[qye)l~]. (10) From (10) or Remark 2, it is evident that V[1,4’(0) I xl equals its C-R lower bound. However, from (9), S’(X) = (V + 1)-l, and a log ~(0 I x)/ax = 0 - EC8 I xl. Thus, using (lo), the C-R lower bound given in (8) equals [V(lCl’(e> I x)l/[(v + l)H(l,“(e) I x)Me 1x11. AI so, using Theorem 1, V(0 I x) a l/Lb + l)E(g”(0) I AT)]. Thus, for the one-parameter exponential family of distributions with conjugate priors, our bounds are usually sharper then Schutzenberger’s bound except in the normal case when the two are identical.

Remark 4. Using Theorem 1, for an arbitrary estimator h(X) of ~(131, the lower bound for the posterior risk is given by

E[(h(X) -r(e))* Ix] 2= [h(x) -~bw I x)1’

+[qf(qi~)]*/~[(a log ~(~i~)/aq*i~].

From the above inequality, it is straightforward to find a lower bound for the Bayes risk of h(X) by averaging over the marginal distribution of X. The resulting bound is different from the one given in Gart (1959), and neither bound is sharper than the other. However, we may remark that it is not possible to retrieve any lower bound for posterior risks from Gart’s result, in contrast to the present approach.

3. Bounds in the presence of nuisance parameters

In this section, we extend the results of the previous section to vector-valued parameters. Let X have p.d.f. f,(r), where 8 = (e,, . . . , eJT E 0 = (a,, b,) x 1. . x (a,, b,). Consider a prior ~(8) of 8, and suppose that for the resulting posterior r(0 I x), all the partial derivatives arr(e I x>/Mj, j = 1,. . . , k, exist. Assume also that

(A) r(0l.x) +O as ej+aj and ej+bj, j= l,..., k.

Under (A) it is immediate that

E[a log r(Olx)/Mj] =0 for all j= l,...,k.

If in addition,

(B) r(e 1 .)(a log r(e I .qaej) + 0 as ej+aj andBj+bj for all j=l,...

then it follows that

(11)

,k,

E[ - (a* log T(e 1 x)/(ae,ae,>) I X] =E[(a log r(e I x)/Wj)(a log r(e Ix)/ae,)lx]. (12)

Consider now parametric functions r(0) of 8 differentiable in each of its arguments. Assume that

(C) 7(e)+ I-4 - 0 as ej+aj and ej+bj, j= I,..., k.

It follows now that

I$(e)(a log p(elg/aej)lx] =E[(ay/ae,)Ix], j=l,...,k. (13)

176

Page 5: Cramér-Rao bounds for posterior variances

Volume 17, Number 3 STATISTICS&PROBABILITY LETTERS 18 June 1993

Using the fact that the square of the multiple correlation coefficient between ~(0) and

a log 7T(f3 I xpej, j = 1,. . . ) k cannot exceed 1, one gets

J+(B) Ix] aT(x>Z-‘(x>G(x), (14)

where

g(X) = [ qwwe, I x>, . . . , qwevae, I x)]‘,

z( X> = ((E(a log r(e I x)/Wj)(a log r(e I x)pe,) I x)).

In particular, if r(e) = e,, one gets

v[y(e) I xl aZ”(x), (15)

where Z”(X) is the first diagonal element of Z-‘(x). Note also, that equality is attained in (14) if and only if r(e) is linearly related to the a log rr(O I x)/M,% with probability 1.

In the special case of a multiparameter exponential family, fs(x) a exp(C~+,Ojxj - $(e)). Consider the conjugate prior T(e) a exp(C~,,ajOj - v+(B)). Then the posterior distribution of 0 is given by

7(elX)a exp ~(~j+Xj)~j-(v+1)~(8) .

[ I (16)

j=l

Since (12) holds in this case, it follows from (16) that

Z(x) = ((E[ -(a’~(e)/(ae,ae,))Ix])). (17)

Also, it is clear from (16) that in this case equality holds in (15) if and only if r(e) is a linear function of

a$(e)/aej%, i.e. a linear function of the vector of population means.

4. Bhattacharyya type bounds

Consider once again the situation when X has p.d.f. f&x) with respect to some a-finite measure P, and e has some prior p.d.f. &3> on (a, b), some open interval of the real line. Suppose that the posterior p.d.f. rr(O I X) admits r derivatives with respect to 0. Define cj(8; x) = K’(f3 I x>(aidf3 I x)/Wj), j = 1 ,..., r. Also, assume that r(B I x) and its derivatives upto order r - 1 with respect to 8 all vanish as 0 -+ a and 0 -+ b. It follows now that

E[y(@)cj(@; x) lx] =E[y’(B)cj-,(e; x) lx] =4j(x) (say). (18)

Also, let

~(x)=E[~-~(elx)(aj~(elx)/aej)(am,(eIx)/aem)Ix]. (19)

Using the fact that the square of the multiple correlation coefficient between ~(0) and c,(e; x), j=l 7 . . . 7 r, cannot exceed 1, one gets

I+(e) IX] h?l~)~-l(x)~(x)~ (20)

where q(x) = (q,(x), . . . , q,(x>jT. Such a bound can be referred to as the rth Bhattacharyya type bound for posterior variances. In the special case of a regular one-parameter exponential family, this bound is attained if and only if r(O) is a rth degree polynomial in $‘(O).

177

Page 6: Cramér-Rao bounds for posterior variances

Volume 17, Number 3 STATISTICS&PROBABILITY LETTERS 18 June 1993

Acknowledgements

Probab. Math. Statist. 1,

185-195. [In Russian.]

Brown, L.D., and L. Gajek (19901, Information inequalities

for the Bayes risk, Ann. Statist. 18, 1578-1594.

Gart, J.J. (1959), An extension of the Cram&-Rao inequality,

Ann. Math. Statist. 30, 367-380.

Noorbaloochi, S., and G. Meeden (19831, Unbiasedness as the

dual of being Bayes, J. Amer. Statist. Assoc. 78, 619-623.

Schutzenberger, M.P. (1957), An extension of the Frtchet-

Cram& inequality to the case of Bayes estimation (ab-

stract), Bull. Amer. Math. Sot. 63, 142.

178