[Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Bayes and Empirical Bayes...

Chapter 3

Bayes and EmpiricalBayes Prediction of aFinite Population Total

3.1 INTRODUCTION

We shall consider in this chapter Bayes and Empirical Bayes (EB) prediction of a finite population total. Following Bolfarine (1989) we develop insection 3.2 theory for Bayes and minimax prediction of finite populationparameters under a general set up. Section 3.3 considers Bayes and minimax prediction of a finite population total for the normal regression modelsunder squared error loss function. The next section considers Bayes prediction under the Linex loss function of Varian (1975) and Zellner (1986).Section 3.5 reviews James-Stein estimation and associted procedures. Subsequently, we consider empirical Bayes (EB) prediction under simple location model. In this model no covariate is used. Ghosh and Meeden (1986)considered EB-estimation of a finite population total under simple locationmodel by using information from past surveys under a normal Bayesianset up. Mukhopadhyay (1998 c) developed an alternative EB-estimator inthe balanced case. Ghosh and Lahiri (1987 a,b), Tiwari and Lahiri (1989)considered the simultaneous EB-estimation of strata population means andvariances replacing the normality assumption by the assumption of posterior linearity. These results have been reviewed in Section 3.6. The nextsection considers EB-prediction of T under models using covariates. Lahiriand Peddada (1992) considered the normal theory Bayesian analysis un-

43P. Mukhopadhyay, Topics in Survey Sampling© Springer-Verlag New York, Inc. 2001

44 CHAPTER 3. BAYES PREDICTION

der multiple linear regression models and obtained EB-Ridge estimators.Ghosh et al (1998) used generalised linear models for simultaneous estimation of population means for several strata. These works have beenreviewed in this section. In the following section we review some Bayesianprediction procedures in small area problems. This section also considersmodification of James-Stein estimators due to Fay and Herriot (1979) andits applications in small area problems. Results obtained from a surveyconducted by the Indian Statistical Institute, Calcutta (1999) are discussedin this context. Finally, we consider Bayes prediction of several (infinite)population means and variances under random error variance model.

3.2 BAYES AND MINIMAX PREDICTION

OF FINITE POPULATION PARAMETERS

As in Chapter 2, we shall assume that the value Yi on unit i is a realisation ofa random variable Y; (i = 1, ... , N). Therefore, the value y = (Y1' ... , YNYof the survey population will be looked upon as a particular realisation ofa random vector Y = (Yi., ... ,YNY having a superpopulation distribution~(J E e (the parameter space). A sample s of size n is selected from Paccording to some sampling design p. Let s = r = P - s. After the samplehas been selected, we may re-order the elements of y so that we may writey = (y~, y~Y where Ys denotes the set of observations in the sample. Clearly,Ys is a realised value of the random vector~. As before, we shall use thesymbol Yi to denote the population value as well as the random variableY; of which it is a realisation, the actual meaning being clear from thecontext. Let g(y) = g(y) be the finite population quantity to be predicted.Examples of g(y) are population total, T = 2:;:1 Yi, population variance,

s; = 2:;:1 (Yi - Y? I(N - 1). Let fJ(ys) be a predictor of g(y). The lossinvoled in predicting g(y) by fJ(ys) is L(fJ(ys), g(y)) where L(.,.) is a suitableloss function. For the given data Ys and for a given value of the parametervector B average loss or risk associated with an estimator §(Ys) is

r(§(ys) IB) = JL(fJ(ys), g(y))dF(Yr IYs, B) (3.2.1)

where dF(Yr I Ys, B) denotes the conditional distribution of Yr given Ys> Band the integral is taken over all possible values of Yr'

Assume that B has prior distribution A over the space e. The average orexpected risk involved with the estimator §(Ys) is

rA(§(Ys)) = fa r(fJ(ys IB)A(B)dB

3.2. PREDICTION OF POPN. PARAMETERS

= JE[L(§(Ys),g(y)) I ys]dPy"(ys)

45

(3.2.2)

where Py,(.) denotes the predictive distribution of Ys'

When there is no reason for confusion, we shall, henceforth, use the symbols E, V to denote expectation, variance with respect to superpopulationmodel, prior distribution, posterior distribution, etc. When the samplingdesign is also involved, we shall use Ep , v;, or E, V to denote expectation,variance with respect to p and often E, V or £, V, to denote the same withrespect to model distribution, the actual meaning being clear from the context.

A Bayes predictor of g(y) in the class 9 of predictors with respect to priorA of fJ is given by §/I.(Ys) where

(3.2.3)

The Bayes predictor §/I.(Ys) has the minimum average risk among a class ofpredictors based on the prior A.

In the absence of knowledge about the prior distribution of fJ, a measure ofperformance of § is

p(§) = sup (JEST(§(Ys) I fJ) < 00 (3.2.4)

A predictor §M(Ys) is said to be minimax in the class 91 of predictors if

(3.2.5)

and if

inf yEg' sup (JEST(§(ys),g(y)) = sup (JE8 inf YEg'T(§(ys),g(y))

Considering squared error loss functions, L[§(ys), g(y)] = [§(Ys) - g(y)Fandprior density A(fJ) it follows that Bayes predictor of g(y) is given by

§/I.(Ys) Jg(y)[Je f(Yr IYs, fJ)dfJ]dYrJg(y)f*(Yr IYs)dYr

= E/I.[g(y) IYs)](3.2.6)

where f*(Yr I Ys) is the conditional marginal ( predictive) density of Yrgiven Ys and E/I.(. IYs) denotes conditional expectation of (.) given Ys withrespect to predictive distribution of (.), when A is the prior density of fJ.Throughout this section we shall consider the square error loss functiononly.

46 CHAPTER 3. BArnSPREDWTION

LEMMA 3.2.1 If fiA(ys) is the Bayes predictor of g(y) under squared errorloss function with respect to prior A, then the Bayes risk of 9A(Ys) is givenby

rA = JVar[g(y) I ys]dPv.(ys).

In particular, if Var[g(y) IYs] is independent of Ys, then

rA = Var[g(y) IYs]

Proof. We have

9A(Ys) - g(y) = E[g(y) IYs] - g(y)

and henceE{[g(y) - 9A(Ys)]21 Ys} = Var[g(y) I Ys]

when the result follows by (3.2.2).

A prior distribution A for () is said to be least favourable if r A 2: r A' for anyother distribution A'.

The following theorems on minimax prediction in finite population followfrom the corresponding theorems in minimax estimation theory (e.g., Zacks,1971, Lehmann, 1983).

THEOREM 3.2.1 If 9A(Ys) is a Bayes predictor and if r(9A(Ys) I (}) is independent of (), then 9A(Ys) is a minimax predictor and A is a least favourabledistribution.

Theorem 3.2.1 is inapplicable when a least favorable distribution does notexist. In this case one may consider the limiting Bayes risk method forfinding a minimax predictor.

Let {Aj } be a sequence of prior distributions for (};9A j (Ys), the Bayes predictor of g(y) corresponding to Aj and

rj = JEe [9Aj (Ys) - g(y)j2dAj«(})= JE{[§Aj(Ys) - g(y)j2 I ys}dPv.(ys)

the corresponding Bayes risk.

THEOREM 3.2.2 Let {Aj,j = 1,2, ...} be a sequence of prior densities overe and let {9Aj(ys),j = 1,2, ... } and {rAj(9Aj(Ys)),j = 1,2, ... ,} be thecorresponding Bayes predictor and Bayes risk. Suppose rj ---4 r as j ---4 00.

If 9(Ys) be a predictor such that

3.2. PREDICTION OF POPN. PARAMETERS 47

then g(ys) is a minimax predictor and the sequence {Aj } is least favorable.

THEOREM 3.2.3 Let y be jointly distributed according to a distribution F,belonging to a family of distributions Fl. Suppose that gM is a minimaxpredictor of g when F E Fo (E F1). If

where r(gM,F) = EF[L(gM,g(y»), expectation being taken with respect tojoint predictive distribution of y E F, then gM is minimax for Fl.

EXAMPLE 3.2.1

Suppose that Yk'S are independent with PO(Yk = 1) = 0 = 1 - PO(Yk = 0).We are interested in estimating the population total T = L~=l Yk. Considerprior distribution of 0 as Beta B(a, b). The posterior distribution of 0 givenYs is then B(a+ nys, a+b+n) where Ys = LkES Yk/n. The Bayes predictorofT is

r

= nys +L E[E(Yi IYs, 0) IYslr

= nys + LE(O IYs)r

_ a+ nys= nys + (N - n)---'--

n+a+b

To find a minimax predictor using Theorem 3.2.1, consider the risk functionr(1'B I0) taking a squared error loss function.

A A 2= Varo(TB - T) + {Eo(TB - Tn

_ N _ n 2 a2 + 0 n + _1_ _ 2a(a + b)

-( ) {(n+a+b)2 [(n+a+b)2 N-n (n+a+b)2l

02 n 1 (a + b)2- [(a+b+n)2 + N-n - (n+a+b)2))

Hence, 1'B is a Bayes predictor with constant risk iff

n(N - n) + (n + a + b)2 - 2a(a + b)(N - n) = 0 (i)

48

and

CHAPTER 3. BAYESPREillCTION

-n(N - a) - (n + a + b)2 + (a + b) + (a + b)(N - n) = 0 (ii)

These give

vn{vn+Jn+N(N-n-l)} a+ba- ---- 2(N - n - 1) - 2

Hence, the minimax predictor of T is

{r;;:;- vn+/n+N(N-n-l)}

A _ ynyS+ 2(N-n-l)TM = nys + --=-:------F~~~=:_

vn{(N-n)+yn+N(N-n-l)}N-n-l

with risk function

r(TM

I 0) = (N - n)2{vn + In + N(N - n - I)}4{vn(N - n) + In + N(N - n - I)}

Bolfarine (1987) compared the performance of the expansion estimatorTE = Nys with TM and found out range of values of () where TM is betterthan TE.

EXAMPLE 3.2.2

Consider the model

Yi = Xif3 + ei, ei C'oJ N(O, 0"2 Xi ), (0"2 known) (i)

and a prior f3 C'oJ N(1/, R). It follows that the posterior distribution of f3 is

where• nRys + 1/0"2

1/ =nxsR + 0"2

Hence, under squared error loss, Bayes predictor of T is

with Bayes risk

(ii)

3.3. PREDICTION UNDER REGRESSION MODEL 49

N A

where X = 'Ei=l Xi· Note that as R -> 00, r(TA) -> (J"2('Er xii 'Es Xi)X =E(Tn - T)2, where Tn is the ratio predictor, (Ys/xs)X and the expectationis taken with respect to model (i). Therefore, by theorem 3.2.2, Tn is aminimax predictor of T under (i). Also, an optimal sampling design tobase Tn is a purposive sampling design p' which selects sample s' withprobability one, where s' is such that

LXi = max sESn LXi,s' s

Sn = {s : n(s) = n},

n(s) denoting the effective size of the sample s.

It follows, therefore, that under simple location model (Xi = 1 'r:/ i), meanper unit estimator Nys is minimax for T.

In this section we have developed Bayes and minimax prediction of finitepopulation parameters under a general set up. In the next section we shallconsider Bayes and minimax prediction of T under regression model withnormality assumptions.

3.3 BAYES PREDICTION OF A FINITE POP

ULATION TOTAL UNDER NORMAL RE

GRESSION MODEL

Consider the modely = X{3 + e

e", N(O, V) (3.3.1)

denoted as "lj;({3,V), where X = «xkJ,k = 1, ... ,N;j = 1, ... ,p»,xkJ isthe value of the auxiliary variable xJ on unit k, e = (ell"" eN)', {3 =({31, ... , (3p)', a p x 1 vector of unknown regression coefficients and V is aN x N positive definite matrix of known elements. It is further assumedthat

(3 '" N(lJ, R) (3.3.2)

The model '!/J({3, V) together with the prior (3.3.2) of {3 will be denoted as'!/In. After the sample s is selected we have the partitions of y, X and V asfollows:

y = [Ys] , X = [Xs] ,V= [~ ~r]Yr Xr v,.s v,. (3.3.3)


We have the following theorems.

THEOREM 3.3.1 Under the Bayesian model 7/Jn, the Bayes predictive distribution of Yr given Ys is multivariate normal with mean vector

(3.3.4)

and covariance matrix

Var1/Jn[Yr I YsJ = ~r (say)

= v;. - v;.s-v.-1Vsr + (Xr - v;.s-v.-1Xs)

(X'V-1X + R-1)-1(X - V. V-1X)'s s s r rs s s

where

Proof We have

[~: ] ~ N [[ i:]13, [is ~:]]Hence, conditional distribution of Yr given Ys is normal with

(3.3.5)

(3.3.6)

(3.3.7)

(3.3.8)E(Yr IYs) E f3 ly,(Yr IYs,l3)

E f3 ly,[Xrl3 + v;.s -v.-1(ys - X sl3)J= E f3ly,[(Xr - v;.s -v.-1X s)13 + Vrs -v.-1YsJ

To find the conditional expectation E(13 IYs) consider the joint distributionof 13 and Ys' It follows that

(3.3.9)

Hence,

(3.3.10)

Substituting this in (3.3.8) and on simplification (3.3.4) follows. Again,

V(Yr IYs) = E f3ly, [V(Yr IYs, mJ + Vf3ly,[E(Yr IYs, I3)J= v;. - v;.s -v.-1Ysr + (Xr - v;.sVs-l X s)

(R - RX~(XsRX~ + Ys)-l XsR)(Xr - Vrs -v.-1X s)' (3.3.1,1)

The result (3.3.5) follows on simplification (using Binomial Inversion theorem).

(3.3.12)

3.3. PREDICTION UNDER REGRESSION MODEL 51

The theorem 3.3.1 was considered by Bolfarine, Pereira and Rodrigues(1987), Lui and Cumberland (1989), Bolfarine and Zacks (1991), amongothers. Royall and Pfeffermann (1982) considered the special case of a noninformative prior which is obtained as the limit of N(v, R) when R ....... 00.

THEOREM 3.3.2 Consider the model 'l/JR with V = 0-2W where W is a knowndiagonal matrix with Wrs = 0, but 0-2 unknown. Consider non-informativeprior distribution for (f3,0-2) according to which the prior density is

2 1~(f3, 0- ) <X 2

0-

In this case, the posterior distribution of Yr given Ys is normal with

(3.3.13)

and

where(X'W-l X )-lW-ls s s s Ys

• 1 •(Ys - X sf3s)'Ws- (Ys - X sf3s)/¢

(3.3.14)

(3.3.15)

with 4>= n -po

Proof Replacing R-1 and v,.s by 0 in (3.3.4) we get (3.3.13) , since E[Yr IYs]is independent of 0-. Again,

Hence,

Vart/Jn[Yr I Ys] = Et/Jn[0-21 Ys][Wr +Xr(X~Ws-IXs)-lX:].

The result (3.3.14) follows observing that

We now consider prediction of linear quantities 9L(Y) = l'y where l = (l~,l~)'

is a vector of constants.

THEOREM 3.3.3 For any linear quantity 9L = l'y, the Bayes predictor underthe squared error loss and any 'l/JR model for which Vart/Jn[Yr I Ys] exists, is

(3.3.16)


The Bayes risk of this predictor is

(3.3.17)

Proof Follows easily from the definition of Bayes predictor and lemma3.2.1.

COROLLARY 3.3.1 The Bayes predictor of population total T(y) under thenormal regression model (3.3.1) - (3.3.2) is

A A 1 A

TB(Ys) = l~ys + l~[Xr,6n + Vrs~- (Ys - X s,6n)]

The Bayes prediction risk of TB is

Et/Jn[(TB(ys) - T(y)]2 = 1~(1Ir - ~s~-l17.r)lr

+ 1' (X - V. V-1X )(X'V-1X + R-1)-1r r rs s 5 S 5 5

(Xr - ~s~-lX s )'lr

(3.3.18)

(3.3.19)

THEOREM 3.3.4 Consider the normal superpopulation model 'l/Jn, with~s = O. The minimax predictor of T with repect to the squared error lossis

with prediction risk

Et/J[TM - T]2 = 1~ ~lr + l~Xr(X;~-lXs)-lX;lr

(3.3.20)

(3.3.21)

Proof Consider a sequence of prior distributions N(v, R k ) for ,6 such thatIIRkl1 = k, when the norm of the covariance matrix IIRII = trace R. Thecorresponding Bayes predictor converges (vide (3.3.6) and (3.3.16)), as k--+00, to the best linear unbiased predictor (BLUP) of Royall (1976)

,. , I aTBLUP = l sYs + 1rX r/--,s

Moreover, the Bayes prediction risk r(Tnk j v, Rk) converges, as k --+ 00, tothe prediction risk of TBWP , namely,

(3.3.22)

Since this prediction risk is independent of ,6, TBWP is, by theorem 3.2.2 aminimax predictor of T.

Note that when the values Yi are considered as fixed quantities (those belonging to s as observed and belonging to r, unobserved but fixed), a statistic O(y) is considered as an estimator for 8(y) (a constant), while if the y;'s

3.4. ASYMMETRIC LOSS FUNCTION 53

are considered as random variables, the same is considered as a predictorfor the O(y) which itself is now a random variable.

So far we have considered squared error loss function only. The next sectionconsideres Bayes prediction of T under a asymmetric loss function.

3.4 BAYES PREDICTION UNDER AN ASYM

METRIC Loss FUNCTION

In some estimation problems, use of symmetric loss functions may be inappropriate. For example, in dam construction an underestimate of the peakwater level is usually much more serious than an overestimate, - see, forexample, Aitchison and Dunsmore (1975), Varian (1975), Berger (1980).Let D.. = ¢ - ¢ denote the scalar estimation error in using ¢ to estimate ¢.Varian (1975) introduced the following loss function

L(D..) = b[ea~ - aD.. - 1], a =I- 0, b > 0 (3.4.1)

where a and b are two parameters. For a = 1, the function is quite asymmetric with overestimation being more costly than underestimation. Whena < 0, L(D..) rises almost exponentially when D.. < 0 and almost linearlywhen D.. > O. For small values of I a I , the function is almost symmetricand not far from a squared error loss function.

Let p(¢ ID) be the posterior pdf for ¢, ¢ E <P, the parameter space, whereD denotes the sample and prior information. Let Eq, denote the posteriorexpectation.

The value of ¢ that minimises (3.4.2), denoted by ¢B, is

¢B = -(I/a) log (Eq,e-aq,),

(3.4.2)

(3.4.3)

provided, of course, E,pe-aq, exists and is finite. The risk function of ¢B, R(¢B)is defined as the prior expectation of L(D..B) and the Bayes risk as the posterior expectation of L(~B) where D..B = ¢B - ¢. Clearly, R(¢B) will dependon the parameter ¢ involved in the model.

When ¢ has a normal posterior pdf with mean m and variance v,

¢B = m- av/2 (3.4.4)


is the Bayesian estimate relative to the Linex loss function (3.4.1). WhenIa Iv/2 is small, ¢B :::: m, the posterior mean which is optimal relative tothe squared error loss.

LEMMA 3.4.1 Let 4> have a normal posterior pdf with mean m and variancev. Under the Linex loss function (3.4.1) the Bayes estimate ¢B has Bayesrisk

BR(¢B) = b[a2v/2]

The proof is straightforward.For further details the reader may refer to Zellner (1986).

3.4.1 BAYES PREDICTOR OF TFrom (3.4.3) the Bayes predictor ofT, with repect to the Linex loss function(3.4.1) is given by

~ 1 TTBL = -- log {E[e- a IYs]}

a(3.4.5)

After Ys ho.. been observed, we may write T = l~ys + I~Yr. Therefore,predictor TBL in (3.4.5) may be written as

~ 1 l'TBL = nys - - log {E(e-a ,Y, IYs)}a

We now consider the models (3.3.1), (3.3.2). Now,

E{e-aI~Y, IYs} = E{E[e-aI~Y, IYs,,6] IYs}

where

(3.4.6)

(3.4.7)

U = 1~[v;. - v;.s~-I~r + (Xr - v;.s~-IXs)

(X;~-IX s + R-I)-I(Xr - v;.s~-I X s)]lr

and ~ is the usual least square estimator of,6. Therefore, the Bayes predictor of T under loss function (3.4.1) is (Bolfarine, 1989)

where

~ - I "- -1 "-TBL = nys + l r[Xr,6 + v;.s~ (Ys - Xs,B)] - (a/2)U (3.4.8)

(3.4.9)

The Bayes risk of TBL with respect to the Linex loss function (3.4.1) is

(3.4.10)

3.4. ASYMMETRIC LOSS FUNCTION 55

In the particular case, when v;.s = 0 and R is so large that R-1 ~ 0, theBayes predictor (3.4.8) reduces to

(3.4.11)

with Bayes risk (which is also the risk under squared error loss function),

(3.4.12)

It follows that the risk (with respect to the Linex loss) of Royall's (1970)optimal predictor

TBwp = nys + I~Xr,6 (3.4.13)

which is also the optimal predictor of T with respect to the squared errorloss and non-informative prior on (3 is given by

, A "RL(TBWP ) = bee -1) > RL(TBL )

It follows, therefore. that TBwp is inadmissible with respect to the Linexloss function. It follows similarly that T~L of (3.4.11) is inadmissible withrespect to the squared error loss function.

EXAMPLE 3.4.1

Consider a special case of the models (3.3.1), (3.3.2):

Yi = xi(3 + ei, i = 1, ... , N

Here,

(i)

(ii)

where

The Bayes risk is

Thus, as in the case of Royall's optimum strategy, a purposive sample s'will be an optimal sample to base TBL . If R is so large that R-1 ~ 0,predictor in (ii) reduces to

T' T' a N (N - n) XXr 2RL = R - - (Y

2 n X s(v)


where Tn = ~X and xr = N~n LrXi' It follows that

R (TA ) ba2 N(N - n) XX r 2

L nL = - -_-(]"2 n X s

Note that as R -4 00, RL(TBL) -4 RL(Tnd = sup (jRL(TnL)' Hence, it

follows from theorem 3.2.2 that the predictor TnL is a minimax predictorwith respect to the Linex loss under model (i).

Bolfarine (1989) also considered Bayes predictor ofT under Linex loss function (3.4.1) in two-stage sampling.

In the next section we consider James-Stein estimator and its competitorsfor population parametrs. Application of these estimators in estimatingfinite population parameters will be considered in sections 3.6 and 3.8.

3.5 JAMES-STEIN ESTIMATOR

ASSOCIATED ESTIMATORS

AND

Suppose we have m independent samples, Yi ir;.dN(()i' B) where B is a

known constant. We wish to estimate () = (()1' ... , ()k)' using the SSEL

k

L(8(y), ()) = L(8i (y) - ()i)2

i=1

(3.5.1)

where 8(y) = (81(y), ... , 8m (y))' and 8i (y) is an estimate of ()i. The maximum likelihood estimator (mle) of () is

and its risk is

y = (Y1' ... ,Ym)'

m

r(y I ()) = Eo L(Yi - ()Y = mBi=1

(3.5.2)

(3.5.3)

The estimator y has minimum variance among all unbiased estimators oramong all translation-invariant estimators (i.e. estimators 'ljJ(y) with theproperty 'ljJ(y + a) = 'ljJ(y) + a V y and all vectors of constants a).

Stein (1955) proposed the following non-linear and biased estimator of ()i,

CB8iS = (1- -)Yi

S(3.5.4)

3.5. JAMES-STEIN ESTIMATORS

where C is a positve constant and

k

S= Lyli=1

The optimum choice of C was found to be

C = (m - 2), for m > 2

and for this choice

r(bs I0) < r(y I 0) = mB V 0

57

(3.5.5)

(3.5.6)

(3.5.7)

The estimator bs = (blS,"" bmS)' , therefore, dominates y with respect tothe loss function in (3.5.1). The estimatorbs , in effect shrinks, the mle,y towards 0 i.e. each component of y is reduced by the same factor. Theamount of shrinkage depends on the relative closeness of y towards OJ forYi near 0 the shrinkage is substantial, while for Yi far away from 0, theshrinkage is small and biS is essentially equal to Yi.

The estimator bs can be interpreted in Bayesian perspective as follows.Suppose 0 has a prior, 0 rv N(O, AI). Then Bayes estimate of 0 is

BbB = (1- --)y

B+A(3.5.8)

Now, under predictive distribution of y with respect to prior distributionof 0 above,

E«m-2)B) =~S B+A

Thus the James-Stein estimator

b _( _(m-2)B).IS - 1 S y

(3.5.9)

(3.5.10)

may be said to be an Empirical Bayes (EB) estimator corresponding to theBayes estimator bB in (3.5.8).

Another extention of the problem arises when Yi i,,!-dN(Oi' B i ) where B i

is known. Assume that Oi i,,!-dN(O, ABi ). One may then calculate JS

estimator of Oi by considering the transformed variables Y;/ J]"J; in place ofYi. This JS-estimator


dominates the mle y with respect to the risk function

m

reO' I B) = L E(o; - By / B i

i=l

(3.5.11)

vB. This estimator will be most suitable against a Bayes prior in which thevariance of the prior distribution is proportional to the sampling variance.

Stein (1960) conjectured that a 'positive-part' version of the estimator OiSin (3.5.4) would be superior to OiS. Baranchik (1970) showed that theconjecture is in fact true.. The positive-part Stein-rule estimator is otwhere

.c+ _ { OiSViS - 0

This estimator will be denoted as

if OiS > 0otherwise

(3.5.12)

[(m - 2)B

J+

1- S Yi

Lindley (1962) proposed the modified Stein-rule estimator

+ _ - [ (m - 3)B J+( - )OiL - Ys + 1 - "( . __ )2 Yi - Ys ,m > 3

LJ Y, Ys

(3.5.13)

(3.5.14)

(3.5.15)

k

where Ys = Ly;/m. This estimator shrinks the mle Yi towards the meani=l

Ys, rather than towards zero.

Stein-rule estimator is scale-invariant (i.e. multiplying Yi by a constant c,changes OiS.tO COiS) but not translation-invariant (i.e. if Y: = Yi + d, thenOiS # OiS + d). Lindley-Stein estimator 01£ is equivariant with respect to achange of origin and scale.

Assuming a multiple regression model, the Stein-rule estimator can be derived as follows. Consider the linear model

y = X{3+u

u "'-' N(O, 0-21)

when y is a n x 1 vector of random observations, X is a n x p matrix ofknown values Xjk(j = 1, ... ,p; k = 1, ... ,n) of auxiliary variables Xj, {3 avector of p unknown regression coefficients and u a (n xl) vector of randomerror components. The Stein-rule estimator of {3 is

~ g0-2

{3s = [1 - b'XIXb Jb

3.5. JAMES-STEIN ESTIMATORS

where g(> 0) is a suitable constant and

is the least squares estimate of {3. The James-Stein estimator of {3 is

59

~ gu'u{3.1S = [1 - (n _ p)lJX'Xb]b,

where (J"2 has been estimated by

82 = il'uj(n - p) = (y - Xb)'(y - Xb)j(n - p)

(3.5.16)

An immediate generalisation of (3.5.10) follows in the following HierarchicalBayes (HB) set up. Suppose that

(3.5.17)

(3 rv uniform(improper) overRP

where B is a known constant, x; = (Xil'" ., Xip) is a set of p known realnumbers, {3 = ({3l, ... ,(3p)' a set of p unknown regression coefficients. TheUMVU-estimator of Yi in the frequentist set up is

* - '(X'X)-lX'Yi - xi Y (3.5.18)

where X = (Xl, ... , X n )' and (X'X) is of full rank. The Bayes estimate of()i is

8iB = yi + (1 - B~F)(Yi - y;)B * + FB+FYi B+FYi

In this case, the J-S estimator of ()i is, for p < m - 2,

(3.5.19)

where

8i.1S * + [1 _ (m- p-2)B]( . _ *)Y, s' Y, Y,(m-p-2)B * + [1 _ (m-p-2)D] .s, Y, s. Y,

(3.5.20)

(3.5.21)

Efron and Morris (1971,1972) pointed out that both Bayes estimators 8'8 =(8iB,···,8;"B)' and EB estimators 8is = (8i.1s,"" 8;".1s)' might do poorlyfor estimating individual ();'s with unusually large and small values (videexercise 2 of chapter 4). To overcome this problem they suggested 'limlitedtranslation' estimators discussed in section 4.4.


In practical situations, often the sampling variance of Yi about ()i is not proportional to prior variance of ()i about 0 (as has been assumed in deriving theestimator (3.5.8)). An approach is to develop an estimator that closely resembles the Bayes estimator for the prior distribution ()i ir::.dN(O, A). Efron

and Morris (1973) first proposed an extention of (3.5.4) in this direction(see exercise 3).

Another suggestion is due to Carter and Ralph (1974). They showed thatfor the model

Yiir::.dN(()i,Di), ()i ir::.dN(v, A), i=l, ... ,k

with Di known but ()i, v unknown, the weighted sample mean

* " Yi /" 1V =6A+Di 6A+Di

I I

has the property, assuming A to be known,

E(L(Yi - v*? /(A + Di)) = k - 1,

(3.5.22)

(3.5.23)

(3.5.24)

where expectation is taken over joint distribution of Y and (). They suggested estimating A as the unique solution A * to the equation

" (Yi - v*? = k _ 16 A*+D

i I

(3.5.25)

subject to the condition A* > O. They considered A* = 0 if no positivesolution of (3.5.23) and (3.5.24) exists. An estimate of ()i is then given by

A* D.8'cn - y' + v* (3 5 26)• - A*+Di I A*+D ..

We shall consider the applications of these procedures in estimating a finitepopulation total in the next section and section 3.8.

3.6 EMPIRICAL BAYES PREDICTOR OF

POPULATION TOTAL UNDER SIMPLE

LOCATION MODEL

In this section we do not use any covariate in the model and attempt tofind EB-estimator of population total. Consider the simple location model

Yi = () + Ei, i = 1, ... , N (3.6.1)

3.6. SIMPLE LOCATION MODEL

where () and E; are independently distributed with

61

(3.6.2)

and the phrase NID denotes independently and normally distributed. Thejoint prior (unconditional) distribution of y = (YI' ... ,YN) is

(3.6.3)

where 1q is a q x 1 vector of l's , Iq is a unit vector of order q and Jq = 1q1~.

The joint conditional distribution of Yr given Ys is , therefore, (N - n) variate normal with mean vector

and dispersion matrix

where M = (52 j 7 2 . The Bayes predictor of population mean y undersquared error loss function, is by formula (3.3.16),

YB = E[y I (8, Ys)] = N-I[nys + (N - n)(Bp, + (1 - B)ys)] (3.6.4)

where B = Mj(M + n) (Ericson, 1969 a). The parameters p, and B (andhence M) are generally unknown and' require to be estimated.

Ghosh and Meeden (1986) used the estimators from the analysis of varianceto estimate the parameter B. They assumed that the data are availablefrom (m-l)- previous surveys from the same (similar type of) population.It is assumed that in the j-th survey (j = 1, ... , m, including the present

or mth survey) the survey character was y(j), taking values y;0) on the ithunit in the population P j of size N j , the sample being 8j of size nj' Therelevant superpopulation model is assumed to be

Y0) - (}0) + c 0) ~ -1 N··]· -1 mi - '-i , (,0- , ... , Jl - "." , (3.6.5)

(}0), f~j) being independently distributed, (}0) rv N(p" 7 2), fy) rv N I D(D, (52).

Let_ ( 0) (j»)'YS j - YI , ... , Ynj

where, without loss of generality, we kake 8j = (l, ... ,nj),(j = 1, ... ,m).Then Ys I , ••• , YSm are marginally independently distributed with

(3.6.6)

62

Let

Define

CHAPTER 3. BAYES PREDICTION

m ~

nT = L nj, Y. = L njy(j) I L nj, y(j) = L yf) Injj=l j j ;=1

m

BMS( Between Mean Square) = L nj(y(j) - y.?I(m - 1)j=l

m nj

WMS( Within Mean Square) = L L(y;(j) - y(j»)2/(nT - m) (3.6.7)j=l ;=1

Lemma 3.6.1

where

and

E(WMS) = 0-2, V(WMS) = 20-4 /(nT - m)

E(BMS) = 0-2+ gT2/(m - 1)

m

9 = (g(nlJ .. ·, nm )) = nT - (L n;)lnTj=l

(3.6.8)

where2 2 2 (. 1 )Tj = 0- + T nj J = ,... ,m . (3.6.10)

Proof The result (3.6.8) follows from the fact that (nT - m)WMSl0-2rv

XZnT- m )' Observe that .;nJ(y(j) - p,) rv N ID(O, TJ). Hence, (m -l)BMS rv

Z'AZ where Z rv N(O,Im) and A = D - uu' with D = Diag (Tf, ... ,T;')and u = (.jn1 T1, ... I ..;n;;.Tm )'1..;nT· The expressions for expectation andvariance of BMS follows from the fact, E(Z'AZ) = tr (A) and V(Z'AZ) =2 tr (A2

), and on simplification, where tr(H) denotes the trace of the matirxH.

Consider the folowing assumptions, called the assumptions A .

• (l)nj ~ 2

• (2) SUpj=l •...•mnj = K < 00

3.6. SIMPLE LOCATION MODEL 63

it can be easily proved that under assumptions A, a consistent estimator ofM-1 = T 2 /(12 is

max{O, (BMSIWMS -1)(m - l)g-l} (3.6.11)

The authors modified the above estimator slightly so that the resultingestimator is identical with the James-Stein estimator in the balanced case(i.e. when n1 = '" = nm = n). Thus, they proposed the estimator

A -1 (m - I)BMS -1M = max{O, ((m _ 3)WMS -1)(m -1)g }, (m 2:: 4) (3.6.12)

The estimator M-1 is consistent for M-1 . It follows that Bj = l+n1M-l isJ

consistent for Bj = l+n:M l'

To estimate J.L we note that the mle of J.L for known M is obtained from thejoint pdf of (Ys l , ••• ,Ysm) as

m m

ji, = 2)1- Bj)yU)12:(1- Bj )j=l j=l

Consequently, an EB-estimator of It is

(3.6.13)

if M-1 =f:. °if M-1 = 0

(3.6.14)

An EB- predictor of population mean YEB is obtained by replacing J.L andBin (3.6.4) by P, and B respectively, where B = (1 +nM-1)-1. Therefore,

(3.6.15)

Alternative EB-estimators of J.L and M are available in the balanced case.In this case, the mle of J.L is P= y= I:~1 yU) 1m. Also, the mle of M-1 is

M-1= ax{O ((m -1)BMS _ ) -I}

m , mWMS In (3.6.16)

These can be taken up as EB-estimators and hence, an alternative EBpredictor of y can be obtained using (3.5.4). In the balanced case, g =

A -1

n(m-l) and M-1differs from M only in the coefficient of WMS (m beingreplaced by (m - 3)). Clearly, the asymptotic (as m -. 00) performance ofboth the estimators are similar.


Mukhopadhyay (1998 c) suggested an alternative EB -predictor of y in thebalanced case. Let

where

m

S)..).' = 2)Y~) - Y()..))(Y~) - Y()..I)) , A, A' = 1, ... , n,j=l

m

Y()..) = LY~")1mj=l

(3.6.17)

(3.6.18)

At any stage j, the observations {Y~), A = 1, ... ,n} can be labelled A =1, ... ,n, randomly. For every j, YS j has an exchangeable (n - 1)- variatenormal distribution with dispersion matrix .E = (J2 In + T

2 I n and any random permutation of observations at the jth stage can be considered whilecomputing S. Therefore, S follows Wishart distribution Wen, m - 1, .E).The mle of (1 +M)-l is then given by (vide Muirhead (1982), p. 114)

(1 + lvI)-l = L: L:~#)..1=;'1 S)..)..I(m - 2) L:)..=1 s)..)..

An EB-estimator of M-1 is, therefore,

M· -1 _ { L:~#)..1=1 S)..)..I O}- max ( ) "n "n ,m - 2 6)..=1 S)..).. - 6)..#)..'=1 S)..)..I

(3.6.19)

An EB-estimator of J.L, it is obtained by replacing M by if in (3.6.14). Thisgives another EB -predictor YES of Y from (3.6.15).

EXAMPLE 3.6.1

Consider the data on number of municiplal employees in 1984 (ME84) in48 municipalities in Sarndal et al (1992, p. 653-654). It is assumed thatm = 4 and the municipalities labelled 1, ... , 12 constitute the populationat the first stage, the next 12 municipalities, population at the secondstage, etc. Six random samples of size four each are selected from eachof the populations at the first three stages. For the present populationwhich consists of municipalities 37, ... ,48, two samples are drawn, s~l) =

(38,40,44,48) and S~2) = (37,41,43,45). Each of the samples from the

earlier stages is combined with sii)(i = 1,2) thus giving two sets of 216samples each and average bias (AB) and average mse (AM) ofthe estimators


tiEB and YEB are calculated for each of these sets of samples. It is observedthat

~(1)AB(YEB) = 62616.6

AM(y~1) = 2372605776.6

AB(y~1) = 22283.8

AM(y~1) = 1193933317.0

(~(2)

AB YEB) = 150360.8

AM(y~1) = 70514314056.7=.(2)

AB(YEB) = 26056.1

AM(y~1) = 1806187631.7

where y~1(k = 1,2) denotes the estimator based on the k-th set of samples

and similarly for Y~1. The estimator YEB was found to be consistentlybetter than YEB both in the sense of average bias and average mse.

To compare the performance of two predictors, say EB-predictor eEB andan arbitrary predictor e of a statistic A(y) vis-a-vis the Bayes predictor eBunder a prior ~, Effron and Morris (1973) (also, Morris, 1983) introducedthe concept of relative savings loss (RSL). Let r(~, e) denote the averagerisk of the predictor e under ~ with respect to squared error loss, as definedin (3.2.2),

r(~,e) = E(e _ A(y))2 (3.6.20)

where the expectation is taken with respect to prior distribution as well asover all y containing Ys.The RSL of eEB wrt an arbitrary predictor e underthe prior ~ is given by

(3.6.21)

RSL(~j eEB, e) measures the increase in average risk in using EB estimatoreEB instead of Bayes estimator eB with respect to the same in using anarbitrary estimator e. The ratio RSL < 1 means eEB is better than e in thesense of smaller average risk. The ratio RSL---+ 0 means EB estimator eEB isasymptotically equivalent to Bayes estimator eB i.e. eEB is asymptoticallyequivalent in the sense of Robbins (1955). It follows that

(3.6.22)

N

Using A(y) = Y = Ly;/N and e = YEB (given in (3.6.15)), Yo and Y1i=l

in succession, where Yo = y = L:;:1 y(j)1m and Y1 = L njy(j) IL njj j

and denoting the superpopulation model (3.6.1), (3.6.2) as c;, we have thefollowings theorems.


THEOREM 3.6.1 Under the prior~,

RSL(~;YEB'YO)= E[(Hm - Bm)(y(m) - J.l)

-Hm(P. - J.l)]2[B;'E(y(m) - J.l?t1 (3.6.23)

RSL(~; YEB' Y1) = (1 - 1m?E[(Hm - Bm)(tJ(m) - J.l)-

Hm(P. - J.lW[E(Y - h)2t1 (3.6.24)

where 1m = nm/Nm, p. is given in (3.6.14) and YB by (3.6.4).

THEOREM 3.6.2 Under assumption A,

(3.6.25)

(3.6.26)

It follows, therefore, that r(~;YEB) ---+ r(~'YB) as m ---+ 00, so that theproposed estimator YEB is asymptotically optmum in the sense of Robbins(1955). The property (3.6.25) readily extends to the estimator YEB'

In a similar. set up (as in (3.6.1) and (3.6.2)) Ghosh and Lahiri (1987 a)considered the simultaneous EB -prediction of strata means in stratifiedrandom sampling. Let Yhi be the value of Y on the ith unit in the hthstratum Ph of size Nh(h = 1, ... , L; L Nh = N). The problem is to find

N.

a EB-predictor of 'Y = ('Y1,"" 'YL)', where 'Yh = L Yh;!Nh, the popula-i=l

tion mean for the hth stratum, on the basis of a stratified random sampleS = Uf=l Sh, Sh, the sample from the stratum h being taken as (1, ... ,nh)without loss of generality (n = L nh), with the sum of squared error loss

h(SSEL) function

1 L

L(c, 'Y) = L L(Ch - 'Yh?h=l

where c = (C1' ... , CL)' is a predictor of 'Y. Let, as before, YSh = (Yh1' ... ,Yhnh)' ,nh

Yh = LYh;!nh. However, the normality assumption (3.6.2) is replaced by1

the following general assumptions:

(a) Conditional on 0 = (Ol, ... , 0d', Yh1,"" Yhnh are iid with distributiondepending only on Oh and E(Yhi I Oh) = Oh, V(Yhi I Oh) = J.l2(Oh) (h =1, ... ,L)

(b) Oh'S are iid with E(Oh) = J.l, V(Oh) = 7 2


(c) 0 < E[P2(Ol)] = U2 < 00

We call this set of assumptions as C. The following assumptions of posterior linearity is also made.

nhE(Oh IYSh) = L ahiYhi + bh (h = 1, ... , L)

i=l

(3.6.27)

(3.6.31)

where the ahi and bh are constants not depending on Ysh. Since Yh;'S areconditionally independently and identically distributed (iicl) given 0h, itfollows (Goldstein, 1975) that

E(Oh IYsh) = ahYh + bh (h = 1, ... ,L) (3.6.28)

where ah's are some constants. It follows from Ericson (1969 b) or Hartigan(1969) that

(3.6.29)

whereM = U2/72 and B h = M/(M + nh)'

Hence, following (3.6.4), Bayes estimator of'Y is 1'B = (1'~), ... , 'Y1L»)' where

1'~) = E('Yh IYSh) = Yh - fhBh(Yh - J1.) (3.6.30)

and fh = 1 - nh/Nh.

Considering EB estimation of 'Y, first assume that 7 and u 2 and hence Bhare known, but J1. is unknown. Since E(ysh) = J1.1 nh , the best linear unbiasedestimator (BLUE) of J1. is obtained by minimising

L

L(Ysh - J1.1nh)'(u2Inh + 72Jnh )-l(YSh - J1.1 nh )

h=l

with respect to J1.. If the underlyimg distributions are normal, the BLUE ofJ1. is identical with its mle, and is given by

L L

J1.* = L(l- Bh)Yh/ L(l- B h). (3.6.32)h=l h=l

The EB estimator of 'Yh, 1'1h) is obtained by replacing J1. by J1.* in (3.6.30).L m

In the balanced case (nh = n V h), J1.* reduces to L LYh;/mL = Ys' In1 1

this case EB estimator of 'Yh is

A (h) - f·B(- -) h 1 L'YEB = Yh - h Yh - Ys, = , ... , (3.6.33)


which is the finite population analogue of an estimator of Lindley and Smith(1972) obtained under normality assumtions and a hierarchical Bayes approach.

In case both M and J.t are unknown, estimators similar to those in (3.6.11)-(3.6.15), (3.6.18), etc. can be obtained under the relevant assumptions.

Ghosh and Lahiri (1987 b) extended the study to simultaneous predictionNh

of variances of several strata 8 2 = (8i, ... ,8'iJ' where 8~ = L (Yhi -i=l

'Yh)2 j(Nh - 1), h = 1, ... ,L under the SSEL (3.6.26). Apart from the assumptions in C it is further assumed that J.t2(()j) is polynomial of (at most)order 2,

(3.6.34)

where at least one of di(i = 0,1,2) is non-zero. The Bayes posteriorlinearity (3.6.27) is also assumed. The Bayes estimator of 8 2 is §~ =

A

2A

2(81(B)' ... ,8£(B»)' where

nh(Nh(Nh- 1))-lE{L L (Yhi-Yhi,?j2

i;6i'=l

nh Nh+ L L (Yhi - Yhi,)2j2

i=l i'=nh+1

Nh

+ L L (Yhi - Yhi l )2j21 YsJi;6i'=nh+1

In the special case where d1 = d2 = 0 and do = 0"2,

nhwhere s~ = L(Yhi - Yh)2 j(nh - 1).

i=l

The EB-estimator of 8~, S~-EB is obtained by replacing J.t and B h by theirestimators. We shall consid~r prediction of finite population variance 8; indetails in chapter 5.

3.7. NORMAL MODEL USING COVARIATES 69

NOTE 3.6.1

The case d1 = d2 = 0, do = (J2(> 0) holds in the normal superpopulation model set up (Ericson, 1969 aj Ghosh and Meeden, 1986). IfYhi I (h i£dPoisson(Oh)(i = 1, ... , nh), do = 0, d1 = 1, ~ = O. Morris (1983)

characterised all possible conditional distributions Yhi I Oh belonging to natural exponential family with quadratic variance functions (QVF).

Tiwari and Lahiri (1989) extended the results of Ghosh and Lahiri (1987a,b). They retained the posterior linearity property (3.5.20) of the posteriorexpected values of the stratum superpopulation mean in each stratum butallowed the prior mean and variance of each stratum superpopulation meanto vary from stratum to stratum. Thus, in their formulation, assumptions(a) of C holds but (b) and (c) are replaced respectively by

• (b)' Oh'S are independent with E(Oh) = ehJ.l and V(Oh) = fh72(h =1, ... ,L)

where the constants eh, fh, gh are assumed to be positive.

In this section we have not considered use of any covariate in the regressionequation. The next section considers EB-prediction of T under normalmodels using covariates.

3.7 EB-PREDICTION UNDER NORMAL

MODEL USING COVARIATES

Lahiri and Peddada (1992) considered Bayesian analysis in finite populationsampling under multiple regression model using assumptions of linearity.Consider the following assumptions:(i)

(ii)(3.7.1)

where Xi = (Xil' ... ,Xip)', H is a p x p positive-definite matrix and (J2, 7 2 areconstants. A sample of size n, say, (1, ... , n) (without loss of generality)is selected from the population following some non-informative samplingdesign p. Under squared error loss function, L(a,O) = (a - 0)2, Bayespredictor for y is


Under model (3.7.1),

E(y I s, Ys)n N

N-IE[LYi+ L E(Yi I s,Ys)]i=l i=n+l

N

= N-1[nys + L x~E({3 I s, Ys)]i=n+l

(3.7.2)

where

(3.7.3)

b = (X~Xs)-lX~Ys

Using (3.7.2) and (3.7.3), Bayes predictor is

Y~ = Ysl + (1- J)X~[(0-2)-lI+ (T2)-1(X~Xs)-lH-lrl

[(0-2)-lb + (T2)-1(X~Xs)-1H-11l]

where

N

Xs = (X15,"" xpsY, Xjs = L xij/(N - n), 1= n/Ni=n+l

(3.7.4)

Following Arnold (1981), Zellner (1986), Ghosh and Lahiri (1987), the authors considered the natural conjugate prior of {3 when

In this case,

where

Y~ = Iys + (1 - J)x~[lI + (1 - C)(b - 1I)]

(3.7.5)

(3.7.6)

When T 2 is very large compared to 0-2 , Y~ tends to the classical regresSionestimator

Yreg = Iys + (1 - J)b' Xs (3.7.7)


When 7 2 is very small compared to (J2, Y~ tends to

Y~ = Jys + (1 - I)x~lI (3.7.8)

We now assume that 1I is known but (J2, 7 2 are unknown. Assuming withoutloss of generality 1I = 0, we have

Y~ = Jys + (1 - J)(l - C)x~b

A ridge estimator of C is

K A2A (J

CK = b'X'X bs s

where0-2 = y~(In - Xs(X~Xs)-lX~)ys/(n - p)

and 7 2 + (J2 is estimated by

_l_b,X'X b2 s sp-

and K is a suitable constant. A EB-ridge type estimator of y is

YEB(K) = Jys + (1 - 1)(1 - OK )x~b

For K=O, one gets Yreg' For p ~ 3,

(3.7.9)

(3.7.10)

(3.7.11)

(3.7.12)

(3.7.13)

(3.7.14)

is the best scale-invariant estimator of (J2. Also, in this case, (3.7.12) is theuniformly minimum variance unbiased estimator. Therefore, for p ~ 3 theauthors proposed

K* = (n - p)(P - 2)n-p+2

Therefore, a EB-ridge estimator of y, using K* is

~B = Jys + (1 - 1)(1 - O)x~b

where

Again, since C ::; 1, the positive-part ridge estimator of C is

0+ = min (1,0)

(3.7.15)

(3.7.16)

(3.7.17)


which gives the positive-part EB-ridge estimator Yi~ by using C+ in placeof C in (3.7.16). For another reason for choice of optimum value K' of Kthe equation (3.7.20) may be seen.

The authors compare the EB -estimators ~B with the classical regressionestimator Yreg in terms of the RSL introduced by Efron and Morris (1973)under the model (3.7.1)( both (i) and (ii)) denoted as e. The RSL of~B

with respect to Yreg is

RSL(I:. -::* "- ) = r(e, ~B) - r(e, Y~)<"YEB,Yreg (1:")_ (I: ".)r <"Yreg r <',YB

(3.7.18)

where r(e, e) is defined in (3.6.20).

The authors obtained expressions for r(e, Y~) and r(e, YEB(K)) for p ~ 3. Itis shown that

( " ") n-p+2 2 /RSL e, YEB(K) , Yreg = ( ) (p ) K - 2K p + 1n-pp -2

which is minimised for fixed n, p for

K = ...:....(n_----'p:...;..)...=.(p_-_2....:...)n-p+2

(3.7.19)

(3.7.20)

the minimum value being p(n~;+2)' The estimator ~B is the best in the

class of all estimators of the form Y EB(K) for p ~ 3 for fixed n and fixedp(~ 3). Also

-::*" (p - 2)(n - p)RSL(e, YEB, Yreg) = 1 - p(n _ p + 2) (3.7.21)

<lVn>p ~3

For fixed p, RSL(e,~B' Yreg) is a decreasing function of n with

limn--->ooRSL(e; ~B' Yreg) = 2/p (3.7.22)

For fixed n(~ 4), RSL(e;~B' Yreg) is a decresing function of p provided3 :::; p :::; 1 + [n/2] and is increasing in p if p ~ 2 + [n/2]' where [x] is theinteger part of x.

Therefore, Y~B is always better than Yreg so long as n > p(~ 3). For fixedp, ~B has increasingly higher precision compared to Yreg as n increases.

For fixed n, ~B increasingly gains superiority over Yreg for some initialvalues of p after which EB estimator loses its superiority and even becomesworse than ~B when p exceeds n. Also, since 1C+ - c 1<1 C- C I


almost surely, RSL(eiYi~,Yreg) < RSL(eiY"eB,Yreg), meaning Yi;; is amore preceise predictor than Y~B'

The frequentist-risk performance of the estimators are then compared wherethe frequentist-risk is defined by

E' being the expectation with respect to model (i) of (3.7.1). It is shownthat for p > 2(1 + 0:) for some real 0: and f3 E B, for all K such that

40:0< K < ,

- - (n - p)(n - p + 2)

ReYEB) < R(Yreg)

whenB = {f3 E RP I f3' X X f3 < p - 2(1 + 0:)(P + 2) (1"2}

s s - 2n(I+0:)(p-2)

Thus, for f3 E B, Yreg is inadmissible. Also, if p ~ 3 and f3X~Xsf3 ~ (1"2, then

3.7.1

We review in this subsection an application of the generalised linear modelin estimation of strata means. First we recall the concept of univariategeneralised linear modelling.

Consider m+ 1 variables (y, x) where x = (Xl,"" x m ) is a vector of covariates or explainatory variables and y is the response or dependent variable.The data are given by (Yi, Xi), i = 1, ... , n where Xi = (Xil,"" Xim), Xij

being the value of Xj on unit i. The classical linear model is

(3.7.23)

where Zi, the design-vector is an appropriate function of the covariate vectorXi (often, Zi = (1 Xi)) and f3 is a vector of regression coefficients. We shalluse the symbol

E(Yi I Xi) = J.li

The quantity TJi = z;f3 is called the natural predictor of Yi' Thus, theclassical linear model is

J.li = TJi (3.7.24)


In the univariate generalised linear model, the assumptions of classical generallinear model are relaxed as follows:

(i)It is assumed that given Xi, Yi are conditionally independently distributedwith a distribution belonging to a simple exponential family (given below)with mean E(Yi IXi) = Mi and possibly a common scale parameter cP.

(ii) The mean Mi is related to T/i by a response function

Mi = reT/i) (3.7.25)

instead of the relation (3.7.24). The corresponding link function is

(3.7.26)

where 9 is the link function and is the inverse of r.

Univariate Exponential Family

The random variable Y has a distribution belonging to a univariate exponential family if its density is

yOi - .,p(Oi)f(yIOi,cP,wi)=exp{ cP Wi+p(Oi,cP,Wi)} (3.7.27)

where Oi is called a natural parameter, cP is a scale parameter, .,p(.) and p(.)are specific functions corresponding to the type of exponential functions,Wi is a weight with Wi = 1 for ungrouped data (i = 1, ... , n) and Wi = nifor grouped data (i = 1, ... , h) if the average iii of the ith group of niobservations is considered as response (or Wi = 1/ni if the sum of theindividual responses, I:j~l Yij is considered as response).

The natural parameter 0i is a function of Mi, 0i = O(Mi)' The mean Mi isgiven by

I f).,p(Oi)Mi = .,p (Oi) = -

f)Oi

Also, the variance is

where

For each exponential family there exists a natural or canonical link (alsoresponse) function. This is given by


i.e.z{3 = TJ = g(p) = (}(p)

The canonical link function links the natural parameter directly to thelinear predictor z'{3.

Bernoulli distribution, Binomial distribution, Poisson distribution, Normaldistribution, gamma distribution, inverse-Gaussian distributions are examples of univariate exponential distributions.

EXAMPLE 3.7.1

Suppose Y '" N(p, (12). Here

The natural parameter () = Pj 'tjJ( (}) = 'tjJ(p) = p2/2, ¢ = (12. Also,

¢V(p) (12V(Yi IXi) = -- = -

Wi Wi

since, V(pi) = 'tjJ"(pi) = 1. Natural response function is

p = TJ = z'{3

Sometimes, a non-linear relationship, e.g.,

are more appropriate.

For further details, the reader may refer to Fahrmeir and 'lUtz (1994),among many others.

Ghosh et al (1998) used hierarchical Bayes estimators based on generalisedlinear models for simultaneous estimation of strata means. Suppose thereare m strata and a sample of size ni is drawn by srs from the ith stratum.Let Yik denote the minimal sufficient statistic (discrete or continuous) for thekth unit within the ith stratum. The Yik'S are assumed to be conditionallyindependent with

(3.7.28)

(k = l, ... ,ni;i = l, ... ,m). The density (3.7.28) is parametrised withrespect to the canonical parameters (}ik and the scale parameters ¢ik(> 0)


(supposed to be known) and 'l/J and p are functions specific for differentmodels.

The natural parameters ()ik are first modelled as

(3.7.29)

where h is a strictly increasing function, the Xik(P x 1) are known designvectors, f3(P x 1) is the unknown regression coefficient, Ui are random effectsdue to strata and the Eik are random errors. It is assumed that Ui and Eikare mutually independent with uiir::.dN(O,a~)and Eir::.dN(O,a2).

Let R u = a;;2,R = a-2,() = (()1l,()12,.",()mnm )',u = (Ul""'Um)', Thenthe hierarchical model is given by the following:

(I) Conditional on ((), f3, u, R u = ru , R = r), Yik are independent with densitygiven in (3.7.23).

(II) Conditional on (f3, u, R u = ru , R = r), h(()ik) rv NID(x:kf3 + Ui, r-1)

(III) Conditional on (f3, Ru = ru , R = r), Ui rv N ID(O, r;;l).

Ghosh et al assigned the following priors to f3, Ru , R.

(IV) f3, Ru and R are mutually independent with f3 rv Uniform (RP)(P <m),Ru rv gamma (a/2,b/2),R rv gamma (c/2,d/2). (The variable Z rv

gamma (a, (3) if f(z) <X exp(-az)zl3-1 1(0,00) (z)).

Our interest is in finding the joint posterior distribution of g(()ik) 's whereg is a strictly increasing function given the data y = (Yll, ... ,YmnJ', inparticular, in finding the posterior mean, variances, covariances of theseparameters. In typical applications g(()ik) = 'l/J'(()ik) = E(Yik I ()ik).

Direct evaluation of the joint posterior distribution of the g(()ik)'S giveny involve high-dimensional numerical integration and is not computationally tractable. The authors used Gibbs sampler techniques to evaluate theposterior distributions. The models in (3.7.28) were extended to the multicategorical data situation and were analysed with reference to an healthhazards data set.

In the next section we review applications of some of the results in sections3.2 - 3.7 in small area estimation. It will be clear that many results developed in all these sections can be applied to the problems of estimation insmall areas.

3.8. SMALL AREA ESTIMATION 77

3.8 ApPLICATIONS IN SMALL AREA EsTIMATION

A sample survey is often designed to produce estimates for the whole geographical area (country or state) as well as those for some broad well-definedgeographical areas (eg. districts). In some situations it is required to produce statistics for local areas (eg. blocks of villages, groups of wards in acity). These are called small areas. The sample is usually selected from thewhole population or from different subpopulations (strata, which generallyconsist of a number of small areas) and as such sample size for a small areais a random variable. Clearly, sample sizes for small areas would be small,even zeros for some areas, because the overall sample size in a survey isusually determined to provide specific accuracy at a much higher level ofaggregration than that of small areas. Usually survey estimates based onsuch small areas would involve formidable standard errors leading to unacceptably wide confidence intervals. Special methods are, therefore, neededto provide reliable estimates for small areas.

Let the finite population P be divided into A non-overlapping and exhaustive small areas P a of sizes Na(a = 1, ... , Aj UaPa = Pj L:=l N a = N).Apart from small areas, we sh?-ll, consider G mutually exclusive and exhaustive domains or strata Dg(g = 1, ... , G)j UgDg = P. The boundariesof the domain may cut across the small areas. In socio-economic surveys,the domains are, generally, socio-economic strata, such as age-sex-religionoccupation-classification such that the units within the same domain arehomogeneous in respect of the characteristic of interest and boundaries ofthese domains cut across the small areas. A domain is generally much largerin size than a small area. It is often assumed that the units belonging toa particular (small area, domain)-cell has the same character as the unitsbelonging to that particular domain, irrespective of any particular smallarea. Thus when an estimate is required for a small area we use data fromthe region outside this small area but belonging to different domains whichoverlap this small (local) area. This is known as "borrowing strength" fromsimilar regions and such estimators are called 'synthetic estimators'.

let Yagk be the value of the characteristic 'y' on the unit k belonging to thedomain g and area a, Le. to the cell (a, g), k = 1, ... ,Nag :

G A A G

LNag = N aO = N a; LNag = N Og , LLNag = Ng=l a=l a=l g=l

A sample S of size n is selected by a sampling design p(s) from P and letSag be the part of the sample of size nag that falls in the cell (a, g). Let


Sa = UgSag , the part of the sample of size n a that falls in area aj Sog = Uasagthe part of the sample of size n og that falls in domain g. Clearly, S = uasa =UgSog . Also, nag is a random variable, whose value may be zero for some(a, g)j na = naO = '2:g nagj nOg = '2:a nag; '2:g '2:a nag = n. We shall denoteby Ta = '2:kEP. Yk, Ya = '2:kEP. Yk/Na = Ta/ N a, Ys. = '2:kES. Yk/na, Ys"" ='2:kESo" Yk/nOg, Ys•• = '2:kES•• yk/nag, the population total, population mean,the sample mean for small area a, the sample mean for domain g and thesample mean for the cell (a, g), respectively, where Yk denotes the value ofyon unit k in the population. Also let Sa = P - Sa, Sag = Pag - Sag, Pag =pnDg.

We shall now consider Bayes prediction of small area totals for some simplemodels. Consider the simple one-way classified random effects model

Yagk = O!a + Eagk

where O!a is the effect due to area a and

It is assumed that O!a is a random variable having distribution,

(3.8.1)

(3.8.2)

The models (3.8.1), (3.8.2) can be used when one believes that the value ofY in an area depends solely on the characteristic of the area, while this areaeffect itself is a random sample from the distribution of such area-effects.Bayes estimator of small area population total Ta , obtained by Theorem3.3.1 is

where

f:B = LYk + L[(1- Aa)O!~+ AaYsa]kEs. kEs.

(3.8.3)

(3.8.4)

In (3.8.3) an estimate of Yk(k E s) is a weighted average of the prior meanO!a* and the sample mean Ysa' As M ---> 00 (the prior information is verymuch inaccurate relative to the current) f:B tends to be the simple ex-

pansion estimator NaYs.' The prediction-variance of f a1B under frequentistapproach, with respect to model (3.8.1) is

(3.8.5)


Note that since Aa is an increasing function of M, larger is the value of M,greater is the prediction variance (3.8.5).

If the O'~'s are not known, the maximum likelihood estimate (mle) of O'~

under model (3.8.1), (3.8.2) is given by &~ = YSa' Substituting tbis in (3.8.3)one gets the EB predictor of Ta ,

(3.8.6)

which is the simple expansion predictor. Its prediction-variance is

(3.8.7)

When O'~ = 0'0 \f a = 1, ... ,A (i.e. the area effects are samples from a common distribution N(O'o, IT';)) , Bayes predictor of population total Ta , T;;B is

obtained by substititing 0'0 for O'~ in (3.8.3). Clearly, as M ......; 00, T;'B alsotends to the simple expansion predictor NaYsa' '

Similarly, if 0'0 is not known, substituting

&; = L AaYsal L Aaa a

into T;'B we get T;'EB with prediction-variance under frequentist approach,with r~spect to m~del (3.8.1),

V(T;;EB - Ta) = (Na - na)lT2 + (Na - na)2AalT

2 Ina

(3.8.8)a

This quantity can be shown to be smaller than (3.8.7). Thus when O'~'s arenearly equal but unknown, one may prefer T;;EBin the smaller mean square

error(mse)- sense to T:EBI even though T;'EB has a bias under (3.8.1) whenO'a's are unknown. ' ,

Consider now the domain-dependent model

(3.8.9)

Assume a normal prior for f3g,

(3.8.10)


The Bayes predictor of Ta by application of Theorem 3.3.1 is

T~~ = L L Yk +L L {(1- /-l-g)f3; + /-l-gYsOg}9 kEs.. 9 kEs••

wherenogK K = 0"1

/-l-g = nogK + l' 0"2

with prediction-variance (under frequentist approach)

V(T~~ - Ta) = L(Nag - nag )0"2 + L(Nag - n ag )Jt;0"2 jnog9 9

(3.8.11)

(3.8.12)

When f3; = f30 V g we get the predictor T2~ by substituting f30 for f3;. Itsprediction-variance is also given by (3.8.12).

If f3; is unknown, replacing f3; by its mle (UMVU estimator) YsOg we getthe EB estimator

T~~B = L L Yk + L L YsOg9 kEs... 9 kEs••

L L Yk + L(Nag - nag)YsOg9 kEs.. 9

T;(G)

(3.8.13)

which is identical to the least squares estimator. Similarly, if f30 is unknown,'we substitute

~o = L JtgYOg/ L Jtg9 9

J' (.1+' T~(l) th EB' TA

(l)lor J-IO III a;B to get e -estimator a;EB'

~(G) ~(G) A(l) A(l) .Note that Ta'B (Ta.EB ), Ta'B(Ta'EB) can be calculated even if some nag = O.It is only ne~ded that no; > o'v g.

If the normality assumption is relaxed, T~~, T~:~ become best linear unbiased estimators under the models (3.8.9),(3.8.10) (without normality).

The UMVU estimator of 0"2 when f3; is unknown is

(3.8.14)Sag

If K is known, the UMVU estimator of 0"1 is K (j2.


Ghosh and Meeden (1986), Mukhopadhyay (1998 c) discussed methods forestimating K and this has been reviewed in section 3.6. Often one canprovide a guessed value of K, say, K f . It can be shown that so long as

Kf > K/2,MSE(T~.c;j) and MSE(T~l~), using K = Kf, are less than that

of the least square e~timatorT;CG). AIso, an incorrect value of K f does not

bias T~:;;. The condition Kf > K/2 is sufficient to guarantee that T~:;; and

T~?~ ar~ superior to T;CG) with repect to mse. Also, T~?kB is supe~ior toT;'CG) if K f > K. '

The models (3.8.1),(3.8.2) and (3.8.9),(3.8.10) are special cases of the model

(3.8.15)

where we assume that Mag has one of the following priors:

(i)Mag ",NID(M~o, er~), (3.8.16)

(ii)Mag "'NID(M~g, er~), (3.8.17)

(iii)Mag "'NID(M~, erZ), (3.8.18)

(iv)Mag "'NID(M~g' erz) (3.8.19)

with

Mag i":!:.d fagk

The cases (3.8.16) and (3.8.17) (together with (3.8.15» coincide with (3.8.1),(3.8.2) and (3.8.9),(3.8.10) respectively. The case (3.8.18) states that thereis an overall-effect throughout the survey population and there is no specificarea- effect or domain-effect. The model (3.8.19) states that for each (smallarea, domain) cell there is a specific effect. In (3.8.19) the area-effects anddomain-effects mayor may not be additive.

Application of Theorem 3.3.1 gives the general form of the Bayes predictor

T~;B = L L Yk + L L {(1- Aag)Gag + AagYag}9 kESag 9 kEsa.q

where Gag = E(Mag), the known prior mean and

A - nagL L _ er;ag- L+1' - 2nag er

(3.8.20)

(3.8.21)


In T~;B a weighted average of prior information and the current data is

used to predict the unobserved part LYk. This estimator becomes thes.

post-stratified direct estimator L NagYag when L -4 00.

9

When nag = 0, we use Oag to predict the non-sampled units in the cell(a,g). When nag> 0, we use the weighted average of Oag and the cell meansYagfor the same.

When Oag is not known, its least square estimates under models (3.8.16)(3.8.18) are, respectively,

p,~O = L AagYag/ L Aag9 9

(3.8.22)

(3.8.23)a a

p,~ = L L AagYag/ L L Aaga gag

(3.8.24)

When a~ ~ 0, and nag values are such that Aag ~ O'V (a,g) and prior meansare unknown, estimate of P:o, P:o = (Ysa) produces the simple expansionestimator TaE = NaYsa, estimate of P~g, Pog(= YsOg) , the synthetic estimator

T;(G) and estimate of P~, P~(= Ys), the EB-estimator LYk + (Na - na)ys'kEs.

When a~ = 00 all these EB-estimators (including the one for the model

(3.8.19» produce the simple direct estimator Ta = L NagYsag'9

EXAMPLE 3.8.1.

A survey was carried out in the district Hugli, West Bengal, India in 199798 under the support of Indian Statistical Institute, Calcutta to estimatethe population size and some household characteristics (like, distribution ofpersons according to age-groups, religion, educational status, occupationalcategory, employment status) in different municipal towns (small areas).The urban area of Hugli district was divided into two strata - stratum 1consisting of 24 single-ward towns and stratum 2 consisting of 12 towns,each having more than one ward. Two towns from the first stratum and ninetowns from the second stratum were selected with probability proportionalto the number of households in the 1991 census. For each selected warda list of residential houses was prepared from the assessment registrars of


the municipality and samples of these houses were selected using linearsystematic sampling.

The results of the survey were mainly confined to the sample municipalareas in stratum 2. For municipalities in stratum 1, list of residential houseswas not available with the municipal or Gram-Panchayet offices. Even theboundaries of these non-municipal towns were not clearly defined. As suchthe data that were collected for these areas could not be used to make anyreliable estimate.

The estimates of population counts depended heavily on the estimation ofnumber of households in these areas. Although the municipalities maintained some figures, they were found unreliable on the basis of earlier censusestimates for the same. The number of households for 1998 for these areaswere estimated on the basis of the census figures for 1971, 1981 and 1991and an exponential growth.

In the synthetic estimation techniques we intend to obtain some synthesizing characters whose values remain independent of the municipal areasi.e. remain approximately constant over all municipalities. With the helpof such characteristics we can then obtain estimates of population sizes fordifferent small areas. The characteristics tried were: average household(hh) size by four occupation groups (Xl), average hh size by three educationgroups (X2), average hh size by two types of houses (X3), average hh size bytwo types of religion (X4), average hh size by the number of living rooms(X5), average hh size by possession of TVjScooterjRefrigeratorjTelephone(X6), average hh size by type of ownership of houses (X7), average hh sizeby possession of agricultural land (xs). For each of these characteristics,estimated average household size for different municipalities and for all thesampled municipalities taken together were calculated. It was found thatfor each of the characteristics Xi, the sample estimate S; = {S;l' ... ,S;mi}of average hh size for the mi groups of Xi, for the jth municipal town wasof the same order as the overall easimate Sh = {Sh1"'" S~}. The population size for each group for the characteristics Xi for each municipalityand hence the population size for each municipality were calculated on thebasis of the synthesizing character xi(i = 1, ... ,8). These were comparedwith the corresponding census estimates for 1991. It was found that X5produced reliable estimates of population sizes in the sense that the estimates obtained showed greater consistency with the census estimates thanthose obtained using other synthesizing characters. For details on syntheticmethod of estimation the reader may refer to Mukhopadhyay (1998 e).

Another series of estimates based on the linear regression of population size(y) on the number of households in the sample (x) was also considered.

84

The model was


Yas = ex + {3xas + ea, a = 1, ... ,9

where 1";,s denotes the number of persons in the sample, X as the numberof households in the sample in municipal area a and ea are independentlydistributed errors with mean zero and variance a 2 • The estimated numberof persons for the area a is

where X a is the estimated number of households in area a. Empirical Bayesestimation of population size was also considered. Consider the followingnormal theory model:

1";,s = (3aXas + ea, ea '" N I D(O, a2)

(3a = Ba+ U a, U a '" N(Ba, 7 2), Ua ind ea, a = 1, ... , 9

Therefore, the posterior distribution of {3a given 1";,s is independent normalwith mean (3* and posterior variance 7* where

An estimate of B a is

B = ~[Ya(71) Ya(81) 1";,(91) 1";,S] a = 1 9a 4 X a(71) + X a(81) + X a(91) + X as ' , ... ,

where 1";,(71) (Xa (71)) denotes the total population (number of households)in area a in 1971 and similarly for the other notations. Also,

9 9 9

A2 1 "" A 2 A""""a = 8' L...,(1";,s - (3Xas ) , {3 = L..., 1";,s/ L..., X asa=l a=l a=!

A2 1 ( _ )27 = 8' T as - Ts

where Tas = Yas /Xas ,1's = 2::=1 T as /9. An empirical Bayes estimate ofpopulation size for area a for 1997-98 is

Also, the 95% confidence interval for 1";, is

3.8. SMALL AREA ESTIMATION

where

85

The estimated population are given in table 3.1. The empirical Bayes estimates seem to be most consistent with the rate of population growth inthe region. In recent times the next decennial census for the year 2001 hasbeen initiated by the Registrar General of India for the whole country. Itsresults, when available, will throw further light on the performane of theseestimators. Further details are available in Mukhopadhyay (1998 e, g).

Table 3.1 Estimates of Population of Some MunicipalAreas of HugH District, West Bengal

Municipal 1991 Estimated Number of Persons in 1997-98Town Census Synthetic Regression EB itcYaB

)Estimate Estimate Estimate

'BY"Arambagh 45211 56084 46693 55678 6449.1

Bansberia 93520 118520 125286 124366 12139.8

Hoogly- 151806 173305 159773 163406 10611.8Chinsurah

Bhadreswar 72474 79725 77852 81628 7812.3

Chandannagar 120378 148384 132081 158355 11238.1

Baidyabati 90081 105223 104073 105544 10143.0

Srirampur 137028 141921 146770 140961 14990.3

Rishra 102815 114882 115894 111058 11427.6

Konnagar 62200 80087 73891 66838 7285.8

In the next subsection we consider application of Carter-Ralph (1974) modification of James-Stein estimators discussed in Section 3.5 in small areaestimation.

86

3.8.1


FAy-HERRIOT ESTIMATES

In US, statistics on population, per capita income (PCI) and adjusted taxes,among other characteristis, are used for determining the allocation of fundsto the state governements and subsequently to the local governments. ThePCI were estimated from the 1970-census. However, for small areas, withpopulation less than 500, the sampling error of the census estimates werelarge and were replaced by the respective county averages. Fay and Herriot(1979) used James-Stein estimates based on auxiliary data related to PCIdata available from the Internal Revenue Service (IRS) and the 1970-censusto improve upon estimates of allocations for local areas. In deriving theirestimates Fay and Herriot (1979) considered extension of (3.5.23), (3.5.25)and (3.5.26) to the linear regression case. Consider

(3.8.25)

where x~ is a p-dimensional row vector and f3 has a uniform (improper)prior distribution. The row vector x~ and sampling variance Da are bothknown, but (3 and B are to be estimated from the data. Now, assuming Bis known, the weighted regression estimate

(3.8.26)

where V = Diag(V';,a, a = 1, ... , k), Vaa = D a + B, gives the minimumvariance unbiased estimate of x~f3. Over the same joint distribution,

(3.8.27)

Following Carter and Ralph (1974), Fay and Herriot estimated B* as theunique solution to the constrained equation

(3.8.28)

They considered B* = 0 when no positive solution is found. The estimatorof ()a is then

B * Dfj(l) _ a * ()aCR - B* + D aYa + B* + D aYa 3.8.29

If B is known, fj~2R is the classical Bayes estimator (compare with (3.5.19)).

An alternative estimator for this problem based on a maximum likelihoodapproach to fitting the model

()a i,,:-dN(x~f3, BD~)

3.9. ERROR VARIANCE MODEL 87

when (3, B and a may be jointly estimated from the data has also beendiscussed by Fay and Herriot. Earlier, Ericksen (1973, 1974) explored theuse of sample data to determine regression estimates. For further details onprediction for small areas in finite population sampling the reader may referto Ghosh and Rao (1994), Chaudhuri (1994) and Mukhopadhyay (1998 e).

3.9 BAYES PREDICTION UNDER RANDOM

ERROR VARIANCE MODEL

Butar and Lahiri (2000) considered empirical Bayes estimation of several(infinite) population means and variances under random error variancemodel. Suppose there are m populations. A random sample of size niis drawn from the ith population. Let Yij be the value of Y on the jthsampled unit in the ith population (j = 1, ... , ni; i = 1, ... ,m), [iis =

I:j~l Yijjni, 8; = I:j~l (Yij - Yi)2 j(ni - 1), Yis = (Yil, ... , YnJ. Consider thefollowing model

Yij I ()i,a} rvNID«()i,a-;)

()i rv NID(x;(3, 72

) (i = 1, ... ,m;j = 1, ... , ni)

where the Inverse Gamma (IG) density is given by

(3.9.1)

(3.9.2)

Here Xi = (Xil, ... , Xip) is a p x 1 vector of known and fixed observationsXi/S on auxiliary variables x = (Xl, ... , Xp) on unit i and (3 is a p x 1 vectorof regression coefficients. The authors considered Bayes estimation of ()i

under squared error loss function. The model (3.9.1) is an extention of therandom error variance model considered by Kleffe and Rao (1992) who,however, obtained empirical best linear estimators of ()i.

The posterior density of u; is

{(ni - 1)8;

exp - 2u2t

ni(Yi - x;(3)2 _ (71 - 1)~ }2(ni72 + un u; (3.9.3)


where '.[1 = ((3, T 2 , TJ, ~). Hence, Bayes estimator of beaD, a function of at is

Now, the posterior distribution of Oi given Yi and at is normal with mean(1 - Bi)'Yi + Bx:(3 and variance T2Bi where Bi = at I(a; + niT2). Hence,Bayes estimator of Bi when '.[1 is known, is

of = E(Oi IYi, '.[1) = E[E{Oi IYi, '.[1,af} IYi, '.[1]

where

Wi = E(Bi IYi, '.[1) =100

B;f(a; IYi, '.[1)da;

The measure of uncertainty of of is

(3.9.4)

(3.9.5)

(3.9.6)

In practice, the parameters in '.[1 need to be estimated from the availabledata. The authors considered empirical Bayes estimators of beat) and Oiby using the ANOVA-type estimators of the parameters in '.[1 as proposedby Arora et al (1997).

Following Tierney et al (1989) and Kass and Steffey (1989) they also proposed suitable approximations to the Bayes estimators of Oi and b(at). It isknown that the posterior probability density is proportional to the productof the likelihood and the prior. Hence, Bayes estimator of b(O) is of thetype

E(b(B) I ) = Jb(O)L(O)n(O)d(O)Y JL(O)n(O)dO

where b(O) is some function of O. The expression (3.9.6) can be derivedapproximately by using the following lemma.

LEMMA 3.9.1 Let h(O) be some smooth function of a m-dimensional vector 0, having first five derivatives and having a minimum at 0 and b besome other smooth function of 0 having first three derivatives. Then undersuitable conditions

(3.9.7)

where h(O) = -n-1logL(O)n(O).

3.10. EXERCISES 89

The authors used the transformation a} = exp(-Pi) and calculated theposterior density f(Pi IYi) and hence h(Pi). Using (3.9.7) they obtained approximate expressions for E[b(Pi) IYi, -r,b] where B i = b(Pi)' They extendedthe resampling method of Laird and Louis (1987) to measure uncertainty ofestimators BfB and bEB(aD and made a numerical study to check the accuracy of Laplace approximation and compared the performance of empiricalBayes estimators of (Ji and or They (1999) also considered the problem ofestimation of a finite population variance.

3.10 EXERCISES

1. Consider the Binomial superpopulation model: Yk'S are independent withPO(Yk = 1) = (J = 1 - PO(Yk = 0), k = 1, ... ,N. Suppose the quantity to bepredicted is the population distribution function FN(t) = 1:t 2:;:1 6.(t - Yi)where 6.(z) = 1(0) when z 2:: 0 (elsewhere). Using a B(a, b) prior for (J, asin example 3.2.1, find a Bayes predictor and hence a minimax predictor ofFN(t).

(Bolfarine, 1987)

2. Consider the Binomial superpopulation model and the B(a, b) prior for(J as in example 3.2.1. Using the Linex loss function (3.4.1) find the Bayespredictor of the population total T and its Bayes risk.

(Bolfarine, 1989)

3. Consider p independent random variables Xi rv N«(Ji, 0 2 ), i = 1, ... ,poThe maximum likelihood estimate of (Ji is Bi = Xi. Suppose (Ji rv N(p" T 2 ).

Then posterior distribution of (Ji is

where Bayes estimate of (Ji is

In EB estimation p" T 2 are estimated from the marginal distribution ofXi : f(Xi) rv N(p" 0 2 + T 2 ), i = 1, ... ,po Hence,

90

Therefore,


- )2, = X '2 = 2[L;CXi - X_I]p, , T U (p _ 3)u2

Hence, show that an EB-estimator of ()i is

Also, if p ~ 4,

p P

E[L(()i - 8f(X)?J < E[L(()i - Xi?] V ()i

i=l i=l

where the expectation is taken over the distribution of Xi given ()i.

(Effron and Morris, 1973)

4. Let Y = (Yi, ... ,Yn )' have a joint distribution conditional on ()(=(()l, ... , ()s))' with

Eo(Y;) = p,(()) , i = 1, ... ,n

Let the prior distribution of () be such that

E{p,(())} = m

n

and V{p,(())} < 00. Let Y = L Y;/n and assume VocY) < 00. Prove thati=l

if the posterior expectation

E{p,(()) IY=y}=ay+,B

where a, f3 are independent y, then

E{p,(())} IY} = [V {p,(())} + Eo{V(Y I ())}]-l

[yV {p,(())} + mEoV {Y I ()))The above model is satisfied for the following simple cases: Y; are conditionally independent normal, binomial, negative binomial, gamma, exponentialand their unknown parameters follow the natural conjugate priors.

(Ericson, 1969 b)

5. Let X = (Xl' ... , X N )' [the random variable Xi having the finite population value xd have a prior distribution ex (x) generated by the assumption

3.10. EXERCISES 91

that conditional on () = (()l, ... , ()m)', X;'s are iid with density fez I ()) and() has a distribution g(()). Thus

~(x) = Li~l f(xi I ())g(())d().

N

Let p(()) = E(Xi I ()),m = E(Xi),p = LX;jN,xs = {(i,Xi),i E s}. Leti=l

Gf be a class of distributions of () having density g(() I x', n ' , y') with theproperty that if x be any observation on X then the posterior distributionof () is

g(() I x' +x,n' + l,yn) E Gf

Assume that for every g(() I x', n' , y') E G f,

x' +am=--

n' +b

where a, b are constants. Show that

XsV(p(())) + mEoV(Xs I ())V(p(())) + EoV(Xs I ())

XS V(JL) + mEJlV(Xs Ip)V(p) + EJlV(Xs I It)

N-n~{V(p(()) IXs)/V(lt(()))}V(p)

where Xs = L x;jn, X s = L X;jn and n is the size of the sample s.iEs iEs

6. Consider the model of example 3.2.2 and suppose that u2 is unknown.Assuming a non-informative prior for (f3, ( 2 ) as given in (3.3.12), show thatthe ratio predictor Tn = (Ys/xs)X is the Bayes predictor of T with Bayesprediction risk

where ;; is given in (3.3.15).

(Bolfarine, 1987; Bolfarine and Zacks, 1991)

7. Suppose the population is divided into K clusters of size N i (i = 1, ... , K).In the first stage a sample s of size k clusters is selected. In the secondstage, a sample Si of size ni is selected from the ith cluster selected at the


first stage (i = 1, ... , k). Let Yij be the y-value associated with the j-thunit in the i-th cluster. Suppose the model is

Pi = v + Vi, Vi""'" N I D(0, CJ~), j = 1, ... , Ni;i = 1. ... , K

eij ir::.dvi

Let Ys = {Yij, j E Si; i E s} denote the observed data. Show that theposterior distribution of I-t = (PI, ... ,PK) given Ys is multivariate normalwith mean vector j1" where

(1 - >'i)Ys + >'iYsiYs

i E S

i E s,

where >'i = CJ~/(CJ~+CJt!ni),Ysi = LjEsiYij/ni,Ys = LiES >'iYs;/LiES >'i andposterior covariance matrix with diagonal and off-diagonal elements givenby

c. = { (1 - >'i)2v2+ (1 - >'i)CJ;; i = j'J (1->'i)(1->'j)v2; i=l=j,

where v2 = [LiEs(CJ~ + CJt!ni)-ltl. Hence observing that the populationtotal T is given by

N;

T= LLYij+ LLYij+ LLYijiEs jEs; iEs jif.s; irts j=l

find Bayes predictor of T along with its Bayes risk. Also find Bayes predictor of T and its Bayes risk with respect to the Linex loss function.

(Scott and Smith, 1969; Bolfarine, 1989)

[Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Bayes and Empirical Bayes...

Documents

Transcript of [Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Bayes and Empirical Bayes...