[Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Inference under Frequentist...
Transcript of [Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Inference under Frequentist...
Chapter 2
Inference underFrequentist TheoryApproach
2.1 INTRODUCTION
In subsection 1.5.6 we considered the concept of superpopulation modelsin finite population sampling and introduced its uses in finding the averagevariance of a strategy for comparison among several strategies. The approach there was necessarily based on a fixed population model, as we hadto confine to strategies having some desirable properties, like unbiasedness,admissibility, sufficiency, suggested by a fixed population set up.
Brewer (1963), Royall (1970, 1976), Royall and Herson (1973) and theirco-workers considered the survey population as a random sample from asuperpopulation and attempted to draw inference about the population parameter from a prediction-theorist's view-point. These model-dependentpredictors are very sensitive to model mis-specification. Cassel et al (1976,1977), Wright (1983), among others, therefore, considered model-based pre;dictors, which are also based on sampling designs, to provide robust prediction under model failures. We make a brief review of some of these resultsin this section.
27P. Mukhopadhyay, Topics in Survey Sampling© Springer-Verlag New York, Inc. 2001
28 CHAPTER 2. FREQUENTIST THEORY APPROACH
2.2 PRINCIPLES OF INFERENCE BASED
ON THEORY OF PREDICTION
As in subsection 1.5.6 we assume that the value Yi on i is a realisation ofa random variable Y;(i = 1, ... , N). For simplicity, we shall use the samesymbol Yi to denote the population value as well as the random variable ofwhich it is a particular realisation, the actual meaning being clear from thecontext. We assume throughout that there is a superpopulation distribution~9 of Y = (1'1, ... ,YN)' indexed by a parameter vector () E e, the parameterspace. Let Ts denote a predictor of T or y based on s (the specific one beingclear from the context).
DEFINITION 2.2.1: T. is model-unbiased or ~-unbiasedor m-unbiased predictor of y if
£('1's) = £(y) = P (say) V () E e and Vs : p(s) > 0 (2.2.1)
DEFINITIOt;" 2.2.2: Ts is design-model unbiased (or p~-unbiased or pm
unbiased) predictor of y if
E£('1's) = p V () E e (2.2.2)
Clearly, a m-unbised predictor is necessarilly pm-unbiased.
For a non-informative design where p(s) does not depend on the y-valuesorder of operation E, £ can always be interchanged.
Two types of mean square errors (ruse's) of a sampling strategy (P, T.) forpredicting T has been proposed in the literature:
(a)£MSE(p, T) = £E(T - T)2 = M(p, T) (say)
(b)EMS£(p, T) = E£(T - J.t)2 where J.t = L: J.tk = £(T)
= M1(P,T) (say)
If T is p-unbiased for T (E(T) = T V y ERN), M is model-expected pvariance of T. If Tis m-unbiased for T, M 1 is p-expected model-variance ofT.It has been recommended that if one's main interest is in predicting thetotal of the current population from which the sample has been drawn,one should use M as the measure of uncertainty of (P, T). If one's interestis in predicting the population total for some future population, which is
2.2. THEORY OF PREDICTION 29
of the same type as the present survey population (having the same p),one is really concerned with p, and here M 1 should be used (Sarndal, 1980a). In finding an optimal predictor one minimises M or M 1 in the class ofpredictors of interest.
The following relations hold:
M(P,T) = EV(T) + E{,B(T)}2 + VeT) - 2£{(T - p)E(T - p)} (2.2.3)
where ,B(T) = £(1' - T), model-bias in T.
It l' is p-unbiased,
M(P,T) = EV(T) + E{,B(T)2} - VeT)
If Tis p-unbiased as well as m-unbiased,
M(P,T) = M1(p, 1') - VeT)
Now, for the given data d = {(k, Yk), k E s}, we have
(2.2.4)
(2.2.5)
(2.2.6)
where s = P - s. Therefore, in predicting T one needs to only predict Us,the part Ls Yi, being completely known.
A predictor
will be m-unbiased for T if
£CUs ) = £(L Yi) = L Pi = Ps (say) V () E 8, V s : pes) > 0 (2.2.7)5
In finding an optimal l' for a given p, one has to minimise M (P, 1') ( or M 1(P, 1'))in a certain class of predictors. Now, for am-unbiased 1',
M(P,T) E£(Us - L:s yk )2= E£{(Us - fts) - (L:s Yk - Ps)}2= E[V(Us) +V(L:s Yk) - 2C(Us,L:s Yk)]
(2.2.8)
If Yi are independent, C(Us,L:s Yk) = 0 (Us being a function of Yk, k E sonly). Hence, in this case, for a given s, the optimal m-unbiased predictorof T (in the minimum £(1' - T)2-sense) is (Royall, 1970),
(2.2.9)
30 CHAPTER 2. FREQUENTIST THEORY APPROACH
where(2.2.10.1)
(2.2.10.2)
for any U~ satisfying (2.2.10.1). It is clear that Ts+' when it exists, doesnot depend on the sampling design (unlike, the design-based estimator, egoeHT)'
An optimal design-predictor pair (p, Tj) in the class (p, f) is the one forwhich
(2.2.11)
for any pEp, a class of sampling designs and T', any other m-unbiasedpredictor E f.
After Ts has been derived via (2.2.9) - (2.2.10.2), an optimum samplingdesign is obtained through (2.2.11). This approach is, therefore, completelymodel-dependent, the emphasis being on the correct postulation of a superpopulation model that will efficiently describe the physical situation athand and thereby, generating Ts . After Ts has been specified, one makesa pre-sampling judgement of eficiency of Ts with respect to different sampling designs and obtain p+ (if it exists). The choice of a suitable samplingdesign is, therefore, relegated to a secondary importance in this predictiontheoretic approach.
EXAMPLE 2.2.1
Consider the polynomial regression model:
.1
£(Yk I Xk) = L 8j{3jx{j=O
V(Yk I Xk) = (]"2 v (Xk)' k = 1, ... ,N (2.2.12)
C(Yk, Yk' I Xk, Xk') = 0, k =f:. k' = 1, ... , N (2.2.13)
where Xk'S are assumed fixed (non-stochastic) quantities, {3j(j = 1, ... , J), (]"2
are unknown quantities, V(Xk) is a known function of Xk, 8j = 1(0) if theternm x{ is present (absent) in £(Yk) = JJk. The model (2.2.12),(2.2.13) hasbeen denoted as e(80 , 81"", 8.1jv(x)) by Royall and Herson (1973). Thebest linear unbiased predictor (BLUP) of T under this model is, therefore,
.1
T;(80 , ... ,8.1) = LYk + L 8j~; L x{j=O s
(2.2.14)
2.2. THEORY OF PREDICTION 31
where /3; is the BLUP of f3j under ~(oo, ... , oJjv(x)) as obtainable fromGauss-Markoff theorem.
Under model ~(O, Ij v(x)),
1';(0, Ijv(x)) = LYI: + {(LxI:YI:/v(xl:))(L xVv(XI:))-l} L XI:s s 5
(2.2.15)
(2.2.16)
It follws, therefore, that if
• v(XI:) is a monotonically non-decreasing function of X
• v(x)/x2 is a monotonically non-increasing function of x
the strategy (P,T') will have mnimum average variance in the class ofall strategies (P,T),p E Pn,T E .em, the class of all linear m-unbiasedpredictors under ~, where the sampling design p' is such that
p'(s) = 1(0) for S = s' (otherwise) ,
s' having the property
LXI: = max sESn LXI:s'
Sn = {s : v(s) = n}
(2.2.17)
(2.2.18)
(2.2.19)
Consider the particular case, v(x) = x%. Writing 1"(0, Ij xg) as 1';, we
have,
1'; = LYI: +HL XI:YI:)(L XI:)}/ L x~S S 5 5
it LYI: + {(LYI:)(L XI:)}/ L XI: = ~s XS S 5 S S
1'; LYI:+{L(yk!xl:)Lxl:}/v(s)
(2.2.20)
For each of these predictors p' is the optimal sampling design in Pn· Tiis the ordinary ratio predictor Tn = (Ys/xs)X and is optimal m-unbiasedunder ~(O, Ij x). This result is in agreement with the design-based result inCochran (1977, p. 158 -160), where the ratio estimator is shown to be theBLU-estimator if the population is such that the relation between Y and X
is a straight line through the origin and the variance of Y about this line
32 CHAPTER 2. FREQUENTIST THEORY APPROACH
is proportional to the (fixed) values of x. However, the new result is thatp* is the optimal sampling design to use Tn where as Cochran consideredsrswor only. It will be seen that if the assumed superpopulation model€(O, 1; x) does not hold, Tn will be model-biased for any sample (includings*) in general and a srswor design may be used to salvage the unbiasednessof the ratio predictor under a class of alternative models.
EXAMPLE 2.2.2
Consider now prediction under multiple regression models. Assume thatapart from main variable Y we have (r+1) closely related auxiliary variablesXj(j = 0,1, ... , r) with known values Xkj V k = 1 ... , N. The variablesYl, ... ,YN are assumed to have a joint distribution € such that
£(Yk IXk) = f30 X kO + {31xkl + ... + (3r x kr
V(Yk IXk) = u2Vk
C(Yk' Yk' I Xk, Xl<') = 0 (k =1= k')
(2.2.21)
where Xk = (XkO, Xkl, ... ,Xkr)', {3o, {31, ... ,(3r, u 2 (> 0) are unknown parameters, Vk a known function of Xk. If XkQ = 1 V k , the model has an interceptterm {3o. Assuming without loss of generaliy that s = (1, ... ,n), we shallwrite
y = (Ys y.)', {3 = ({3o, ... ,(3r)'
~~:] = [~:]XNr
v = [~ 0]o ~
(2.2.22)
X s being a n x (r + 1) submatrix of X corresponding to k E s, (X. defined similarly) ~(1fs) being a n x n«N - n) x (N - n)) submatrix of Vcorresponding to k E s(k E s). The multiple regression model (2.2.22) is,therefore,
£(y) = X{3, D(y) = u 2V
D(.) denoting model-dispersion matrix of (.). We shall denote
X j = LXkj, Xjs = LXkj, Xjs = xjsln,
k kEs
(2.2.23)
2.2. THEORY OF PREDICTION
( )' - (- - )'Xs = XOs,···, Xrs , Xs = XOs,"" Xrs
33
and Xjs, Xjs, x s, Xs similarly. The model (2.2.23) will be denoted as e(X, v).For a given s, the BLUP of T is
t;(X, v) = LYle + x~~;
where ~; is the generalised least square predictor of {3,
{3~* = (X'V-1X )-1(X'V-1 )s s s s s s Ys
Hence (Royall, 1971),
t;(X, v) = [1~ + (IN_nXs)(X~~-lXs)-l X~~-l]ys
where 1~ = (1, ... , l)~Xl' Also,
(2.2.24)
(2.2.25)
(2.2.25)
M(p, t*)s
0-2E[{x~(X~~-lXs)-lxs} +L Vie]
s
(2.2.26)
2.2.1 PROJECTION PREDICTORS
We considered in (2.2.6) the problem of predicting LYle only, because theleEs
part LYle of T is completely known when the data are given and foundleEs
optimal strategies that minimise M (p, t). However, in predicting the totalof a fnite population of the same type as the current survey population,
one's primary interest is in estimating the superpopulation total Ji- = L Ji-le.Ie
For a given s, a m-unbiased predictor of T will, therefore, be
where£(YIe) = Ji-le,k = 1, ... ,]v
(2.2.27)
(2.2.28)
The predictors (2.2.27) are called 'projection predictors' (Cochran 1977,p.159; Sarndal, 1980 a).
Under e(80 ,"" 8]; v(x)), BLU-projection predictor of Tis
N ]
1'*(80"", 8]jv(x)) = L L 8j~;x{Ie=l 0
(2.2.29)
34 CHAPTER 2. FREQUENTIST THEORY APPROACH
where~; is as defined in (2.2.14). Under ~(x,v),
if*(x ) = x'f3~* = l'Xf3~*,v 0 s s
where Xo = (xoo, X01, ... ,xor ), XOj = L Xkj,~; is given in (2.2.25).k
(2.2.30)
2.3 ROBUSTNESS OF MODEL-DEPENDENT
OPTIMAL STRATEGIES
The model-dependent optimal predictor t(~) will, in general, be biasedand not optimal under an alternative model e. Suppose from practicalconsiderations we assume that the model is ~(50,"" 5,J;v(x» and use thepredictor t(~) which is BLUP under ~. The bias of this predictor under adifferent model ~' for a particular sample s is
£dt:(~) - T} = B{t:(~), ~'} = B6~, ... ,6:,;vl(x)(t:(50' .. . ,5.1; vex»~ (2.3.1)
~'-bias of t*(~) for a particular sampling design pis
L£dt:(~)- T}p(s)sES
(2.3.2)
To preserve the property of unbiasedness of t* (~) even when the true modelis ~', we may choose the sampling design in such a way that t*(~) remainsalso unbiased under e. With this end in view, Royall and Herson (1973)introduced the concept of balanced sampling design.
Another way to deal with the situation may be as follows. Of all thepredictors belonging to a subclass f(~) say, we may choose one to(~), whichis least subject to bias even when the model is e. Thus, for a given s, weshould use to(~) such that
y t(~) E f(~), the choice of subclass f(~) being made from other constderations, e.g. from the point of view of mse, etc. The predictor ~(~) will bethe most bias-robust wrt the sample s and the alternative model e amonga class of competing predictors t(~) E f(~).
2.3. ROBUSTNESS OF STRATEGIES 35
2.3.1
We have
where,
....
BIAS OF T;B{T;, e(e5b, . .. ,e5~j v(x))}
.J= L e5jHg (j, s)f3j
j=O
Hg(j, s) = [L x{+1-9 L Xk - L xi-g
L x~l/ L xi-g
s(2.3.3)
(2.3.4)
which is independent of the form of the variance function in e' (Mukhopadhyay, 1977). Note that Hg (l, s) = 0.
DEFINITION 2.3.1 (Royall and Herson, 1973). A sampling design p(L)is a balanced s.d. of order L (if it exists) if p(s) = 1(0) for s = sb(L)(otherwise), where sb(L), called a balanced sample of order L is such that
(2.3.5)
and(2.3.6)
If there are K such samples (2.3.5), p chooses each such sample with probability K-1 .
It follows from (2.3.3) and (2.3.4) that the ratio predictor T1• = T·(O, Ij 1)which is optimal under the model e(O,ljx) remains (m-)unbiased underalternative class of models e(e5o, ... , e5.Jj v(x)), when used along with a balanced sampling design p( J).
In general, consider the bias ofT*(O, Ijv(x)) under e(e5o, ... , e5.Jj V(x)). Thisis given by
(2.3.7)
Hence, if a sample s* (J) satisfies
(L x{)/(L Xk) = (L x{+l /v(xk))/(L xi!V(Xk)) ,s
36 CHAPTER 2. FREQUENTIST THEORY APPROACH
j=O,I, ... ,J (2.3.8)
the predictor 1'*(0,1; vex)) remains unbiased under e(80 , .. . ,8.1; Vex)) ons*(J) for any Vex).
Samples s*(J) satisfying (2.3.8) may be called generalised balanced samples of order J (Mukhopadhyay and Bhattacharyya, 1994; Mukhopadhyay,1996).
For v(x) = x 2, (2.3.8) reduces to
n-1 L x{-l = (L x{)/(L Xk), j = 0, ... , J5
(2.3.9)
Samples satisfying these conditions have been termed over-balanced samples so(J) ( Scott et al, 1978 ).
The following theorem shows that 1'*(0,1; vex)) remains BLUP under e(80 ,
... , 8.1i Vex)) for s = s*(J) when Vex) is of a particular form.
THEOREM 2.3.1 (Scott, et aI, 1978) For s = s*(J),T*(O,liV(X)) is BLUPunder e(80 ,"" 8.1i Vex)) provided
.1Vex) = vex) L 8j aj xi-1
a
where a/s are arbitrary non-negative constants.
(2.3.9)
It is obvious that all types of balanced samples are rarely available inpractice. Royall and Herson (1973), Royall and Pfeffermann (1982) recommended simple random samples as approximately balanced samples Sb(J).Mukhopadhyay (1985 a) showed that simple random sampling and ppsyXsampling are asymptotically equivalent to balanced sampling designs p(J)for using the ratio predictor. Mukhopadhyay (1985 b) suggested a postsample predictor which remains almost unbiased under alternative polynomial regression models. For further details on the robustness of the modeldependent optimal predictors the reader may refer to Mukhopadhyay (1977,1996), among others.
2.4 A CLASS OF PREDICTORS UNDER
MODEL e(X, V)Sarndal (1980 b) considered different choices of ~ under e(X, v). Considering ~* in (2.2.25) (dropping the suffix s), we note that a predictor ~ of f3
2.4. PREDICTORS UNDER ~(X, V)
is of the form{3A = (Z' X )-1Z'Y.s s 5 S
37
(2.4.1)
where Z~ is a n x (r + 1) matrix of weight Zkj to be suitably chosen suchthat a predictor ~ of {3 has some desirable properties. The matrix Z~Xs isof full rank. Different choies of Zs are:
(a)1r- 1weighted: Zs and the corresponding ~ may be called 1r-1 weightedif
(2.4.2)
ie. if(2.4.3)
C(Zs) denoting the column space of Zs,>" = (>"O,>"l, ... ,>"r)', a vector ofconstants >"j. Here II = Diag (1rk; k = 1, ... , N), lIs = Diag (1rk; k E s).
If Zs = II;1X s ,
(2.4.4)
(b)BLU - weighted. Here Zs = 11.-1X s when
(2.4.5)
(c) weighted by an arbitrary matrix Q Here Zs = QsXs, where Q is anarbitrary N x N diagonal matrix of weights and Qs is a submatrix of Qcorresponding to units k E s. Therefore,
(2.4.6)
(2.4.7)
For ~(X, v), Cassel, Sarndal and Wretman (1976), Sarndal (1980 b) suggested a generalised regression (greg) predictor of T,
l'II-1y + (I'X -1'II-1X )(3A.n s S 5 S 5
Tcnev-1) = Ten (say)
where ~; is the BLUP of {3j obtained from the jth element of ~. given in(2.2.25). If {3 is known, (2.4.7) is a generalised difference predictor, studiedby Cassel et al (1976), Isaki and Fuller (1982),among others.
38 CHAPTER 2. FREQUENTIST THEORY APPROACH
For arbitrary weights Q in (3(Q), generalised regression predictor of Tis
r
L Yk LA L Xkj- + f3·(Q)(X· - -n J J ns k ask
I~II;I~+ (I'X -1~II;IXs){3(Q)
(2.4.8)
Some specific choices of Q are V-I, II-I, II-IV-I.
EXAMPLE 2.4.1
Consider the model €: £(Yk) = f3,V(Yk) = a2,C(Yk,W) = O,k =F k' =1, ... , N. For predicting y,
". '"' Yk _ '"' 1YGn = L.J Nn +Ys{l- L.J Nn }.kEs k kEs k
For designs with nk = n/N V k, both the predictors coincide.
EXAMPLE 2.4.2
The model is € : £(Yk) = f30 + f3xk, V(Yk) = a2vk, C(Yk' Yk') = 0, k =F k' =1, ... ,N. Here,
where
s
Also, the greg-predictor of y is,
where {3"s are the generalised least squares predictors of f3's.
2.5. ESTIMATION OF DESIGN-VARIANCE 39
Wright (1983) considered a (P, Q, R) strategy for predicting T as a combination of sampling design p and a predictor
r r
T(Q, R) = L:>k{Yk - L Xkj~j(Q)} + L Xj~j(Q)kEs 0 0
(2.4.9)
= l'b.Ry + (I - b.R)X~(Q)
where b. = Diag (Ok; k = 1, ... , N), Ok = 1(0) if k E (~)Sl R = Diag (rk; k =1, ... , N), R s = Diag (rki k E s), rk being a suitable weight and I is a unitmatrix of order N. For different choices of Q and R one gets differentpredictors. Some choices of Q are, as before, V-I, n-l, (Vn)-l and of Rare 0, I, and n-l. The choice R = 0 gives projection predictors of the type(2.2.29); R = n-l gives the class of generalised regression predictors of thetype (2.4.8) considered by Cassel, et al (1976, 1977), Sarndal (1980 b).
EXAMPLE 2.4.3
Consider the model e: [(Yk) = (3Xk' V(Yk) = Vk! C(Yk! Yk') = 0, k f. k' =1, ... ,N. Here
2.5 ASYMPTOTIC UNBIASED ESTIMATIONA
OF DESIGN-VARIANCE OF TCR
We shall now address the problem of asymptotic unbiased estimation ofdesign -variance ofTGR(n-IV-I ) under e(X,v). Consider a more generalproblem of estimation of A linear functions F = (FI , F2 , . •. ,FA)' = C'ywhere Fa = C~y, Ca = (CIa,'" ,CNq)', CaN x Q matrix ((Cka)), Cka beingknown constants. Consider the following estimates of Fa :
where C~s is the row vector (Cka , k E s) and Yk = x~~s with
(3A = (3A(V-In-l ) = (X'V-In-Ix )-IX'V-In-1s s s s s s s s Ys
(2.5.1)
(2.5.2)
40 CHAPTER 2. FREQUENTIST THEORY APPROACH
The estimator (2.5.1) is an extension of generalised regression estimatorAll A A ATCR(V- n- ) of T. Let T = (Tl , ... , TA)'. Then
TA Cln-l( A) CIA= S S Ys - Ys + Y
where Cs is the part of C corresponding to k E s. Now
whereG~ = C~ - M~H;lX~V.-l
M~ c~n,;-lxs - CIXH s x~v.-ln,;-lX s
(2.5.3)
(2.5.4)
(2.5.5)
Thus Ta = L gskayk/7fk, gska being the (k, a) th component of Gs. The fol-kEs
lowiong two methods have been suggested by Sarndal (1982) for estimatingthe design-dispersion matrix D(T) = (covp(Ta , Tb))
(a)TAYLOR EXPANSION METHOD:
An estimate of Covp(Ta , Tb) is approximately the Yates-Grundy estimatorof covariance,
L L(7fH/7fk7f/ - 1) (Zka/7fk - z/a/7f/)(Zkb/7fk - Z/b/7f/)k</Es
= vT(a, b) (say)
where
Writing
(2.5.6)
(2.5.7)
YGs(dka , dkb ) = L (7fkL/7fk7f/-1)(dka/7fk-d/a/7f/)(dkb/7fk-d/b/7f/) (2.5.8)k</Es
as the YG-transformation of (dka ,dkb), k E s, we have
(2.5.9)
(b) MODEL METHOD:
Here an approximate estimate of Covp(Ta , Tb) is
(2.5.10)
2.5. ESTIMATION OF DESIGN-VARIANCE 41
where(2.5.11)
Both ((VT)) and ((VM)) are approximately p-unbiased for D(T) where T =(Ta, ... ,TA )" whether the assumed model €(X,v) is true or false. Themodel is used here only to generate the form of the estimator T and hence((VT )) and ((vM)).
EXAMPLE 2.5.1
Consider the model €(O, 1; x). Here
(Ha'jek's predictor). Two estimators of Vp(:~) are
(i)
(ii)
where
Under srs, VM = ~ (say) = (X/x)2VT , where, VT = V (say) = {(I f)/n(n-l)} Ls(Yk-j3xk?, the usual estimator of mean square estimator ofratio estimator of population mean. Royall and Cumberland (1978) derivedrobust variance estimator Vh = hsv where,
If n is large and N » n, hs ~ (x/xs)2 and Vh ~~.
For further details on inferential problems in survey sampling the readermay refer to Sarndal et al (1992), Mukhopadhyay (1996 ), among others.