On sampling with probability proportional to size sampling with probability... · On sampling with...

33
ELSEVIER Journal of Statistical Planning and Inference 62 (1997) 159-191 journal of statistical planning and inference On sampling with probability proportional to size Bengt Ros6n* Statistics Sweden, S-115 81 Stockholm, Sweden Received 7 June 1996 Abstract One of the vehicles for utilization of auxiliary information is to use a sampling scheme with inclusion probabilities proportional to given size measures, a ~zps scheme. The paper addresses the following 7zps problem: Exhibit a rtps scheme with prescribed sample size, which leads to good estimation precision and has good variance estimation properties. Ros6n (1997) presented a novel general class of sampling schemes, called order sampling schemes, which here are shown to provide interesting contributions to the ~ps problem. A notion 'order sampling with fixed distribution shape' (OSFS) is introduced, and employed to construct a general class of nps schemes, called OSFSrtps schemes. A particular scheme, Pareto ~ps, is shown to be optimal among OSFSTrps schemes, in the sense that it minimizes estimator variances. Comparisons are made of three OSFSnps schemes and three other 7rps schemes; Sunter ltps and systematic ~ps with frame ordered at random respectively by the sizes. The main conclusion is as follows. Pareto nps is superior among r~ps schemes which admit objective assessment of sampling errors. © 1997 Elsevier Science B.V. AMS classification: primary 62D05; secondary 60F05 Keywords: Sampling with probability proportional to size; Order sampling with fixed distribu- tion shape; Pareto sampling 1. Introduction and outline 1.1. Notation and basic assumptions U = (1, 2 ..... N) denotes a finite population. A variable on it, z = (zl, 22, ... ,ZN), associates a numerical value to each unit in U. A sampling frame, in the guise of a list with records that one-to-one correspond with the units in the population, is available, and we let U refer to frame as well as population. We confine to without replacement *Tel.: 46-8783 4490; E-mail: [email protected] 03"78-3758/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved Pll S0378-3758(96)00 1 86-3

Transcript of On sampling with probability proportional to size sampling with probability... · On sampling with...

Page 1: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

ELSEVIER Journal of Statistical Planning and

Inference 62 (1997) 159-191

journal of statistical planning and inference

On sampling with probability proportional to size

B e n g t R o s 6 n *

Statistics Sweden, S-115 81 Stockholm, Sweden

Received 7 June 1996

Abstract

One of the vehicles for utilization of auxiliary information is to use a sampling scheme with inclusion probabilities proportional to given size measures, a ~zps scheme. The paper addresses the following 7zps problem: Exhibit a rtps scheme with prescribed sample size, which leads to good estimation precision and has good variance estimation properties.

Ros6n (1997) presented a novel general class of sampling schemes, called order sampling schemes, which here are shown to provide interesting contributions to the ~ps problem. A notion 'order sampling with fixed distribution shape' (OSFS) is introduced, and employed to construct a general class of nps schemes, called OSFSrtps schemes. A particular scheme, Pareto ~ps, is shown to be optimal among OSFSTrps schemes, in the sense that it minimizes estimator variances. Comparisons are made of three OSFSnps schemes and three other 7rps schemes; Sunter ltps and systematic ~ps with frame ordered at random respectively by the sizes. The main conclusion is as follows. Pareto nps is superior among r~ps schemes which admit objective assessment of sampling errors. © 1997 Elsevier Science B.V.

AMS classification: primary 62D05; secondary 60F05

Keywords: Sampling with probability proportional to size; Order sampling with fixed distribu- tion shape; Pareto sampling

1. Introduction and outline

1.1. Notation and basic assumptions

U = (1, 2 . . . . . N) denotes a finite popu la t ion . A variable on it, z = (zl , 22 , . . . , Z N ) ,

associa tes a numer ica l value to each unit in U. A sampling frame, in the guise of a list with records tha t one - to -one co r r e spond with the units in the popu la t ion , is avai lable ,

and we let U refer to frame as well as popu la t ion . W e confine to without replacement

*Tel.: 46-8783 4490; E-mail: [email protected]

03"78-3758/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved Pll S 0 3 7 8 - 3 7 5 8 ( 9 6 ) 0 0 1 86-3

Page 2: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

160 B. Ros~n / Journal o f Statistical Planning and Inference 62 (1997) 159-191

L(.v; w) =

when ~t~j > 0,

timator is

(wor) probabi l i ty sampling. Then the ou tcome of a sampling scheme is specified by the sample inclusion indicators 11,12 . . . . . IN (I~ = 1 if unit i is sampled and = 0 otherwise), P, E, V and 12 denote probabi l i ty , expectat ion, var iance and variance est imator . First and second order inclusion probabilities are denoted by rc~ = P(I~ = 1) and ~zlj = P(Ii = l j = 1), i, j -=- 1, 2 . . . . . N. The condi t ion rci > 0, i = 1, 2, . . . , N, is pre- sumed to be in force throughout . Moreover , we confine to sampl ing schemes with f i xed sample size, denoted n. Then

~1 + ~2 + "'" + ~N = n. (1.1)

A linear statistic is of the form stated in (1.2), where the weights w = (wl, w2 . . . . . wN)

are presumed to be known for all units in U while the study variable values

y = (Yl, Y2 . . . . , YN) are known only for observed units. Unless stated otherwise, it is p resumed that non-response does not occur (i.e. that sampled and observed units agree):

N Yi" Wi" Ii. (1.2)

i=1

i, j = 1, 2, . . . , N, the cor responding Sen-Ya tes -Grundy variance es-

N N

V[L(y ; w)] = ½ ~ ~ (yi 'wi - - y j " w j ) 2 ( g i • gj/gij - - 1) "Ii "lj. (1.3) i = i i = 1

The central finite popu la t ion inference task is es t imat ion of a population total v(y) =

Yi. The Horvi t z -Thompson ( H T ) estimator for z(y), denoted ~(Y)nT, is (1.2) with weights wl = 1/rc~. It yields unbiased es t imat ion of v(y), and (1.3) specializes to a fo rmula for I2[~(Y)Hx].

1.2. The ~ps problem

Let ~ = (21,22, . . . , 2N) be a set of desired inclusion probabil i t ies for a size n wor sample, referred to as target inclusion probabilities. By (1.1), the condi t ion below mus t be met;

0 < 2i ~ 1, i = 1, 2, . . . , N, and 21 "~ ~c2 -I- " " "~ A N = n. (1 .4 )

The rcps problems concerns, in its first round, exhibit ion of a size n wor sampling scheme with inclusion probabil i t ies equal to, or at least close to, the target ones, i.e.

rcl = (or ~ )21, i = 1, 2, ... ,N. (1.5)

The p rob l em is usually formula ted slightly differently, which complies bet ter with the abbrev ia t ion nps = inclusion probabil i t ies rc p ropor t iona l to sizes. A size variable

s = (sl, s2, ... ,SN), Sl > O, i = 1, 2 . . . . ,N , is specified, and the task is to exhibit a sampl ing scheme with the ~i p ropor t iona l to the si. This is equivalent to the above

Page 3: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Roskn /Journal of Statistical Planning and Inference 62 (1997) 159 191 161

rips problem with 2i = n. s i /~ js~ , i = 1, 2, . . . , N. Note, though, that some size vari- ables s lead to 'impossible' problems, since one or more of the 2's exceed 1. When met in practice, this obstacle is usually circumvented by introducing a 'take all' stratum of units with large sizes.

The first round of the rips problem has many solutions. Therefore, in a second round the problem also concerns exhibition of a good, preferably even a best, scheme. To this end additional requirements have to be introduced, usually the following ones:

(i) The sample selection should be simple to implement. (ii) The scheme should lead to good estimation precision.

(iii) The scheme should have good variance estimation properties. Requirement (i) needs no comment; (ii) needs only that estimation precision is, unless stated otherwise, understood to relate to the HT-estimator. Requirement (iii) is more intricate. The number one desire under (iii) is that the scheme should satisfy (1.6) below, i.e. be measurable in the terminology of S~irndal et al. (19921. This condition is sufficient and essentially necessary for existence of a consistent variance estimator (for the HT-estimator), or in other words for admitting ob]ectit;e assessment q[ samplin,q

errors:

~ i i>0 , i , . j= 1 , 2 , . . . , N . 11.6)

For measurable sampling schemes, additional desires under (iii) are

• A variance estimation procedure should be exhibited, the simpler the better. (1.7) • Variance estimates should preferably never become negative. (1.8)

One way to meet (1.7) is to exhibit a formula/algorithm for computation of rcii, i , j = 1,2 . . . . . N. Then (1.3) (or some of its 'relatives') can be employed. If also ~z~. =a - lr~j >~ 0, i , j = 1, 2, . . . , N , (1.8) will be satisfied for the variance estimator (1.3).

]-'he most frequently employed ~ps scheme in practice, sys temat ic =ps, does in general not satisfy (1.6). In fact, systematic =ps is not one but a whole family of sampling schemes. In principle, there is one scheme for each ordering of the sampling frame. Even if the HT-estimator does not depend on the frame order, its precision does. It is beneficial for estimation precision if the study variable values exhibit a 'smooth trend' in frame order, and the more linear the trend is the better. Then two variance reducing forces are set to work: (a) variation of inclusion probabilities (b~ rrps), and (b) an implicit homogeneity stratification of the population {by the frame ordering), With access to a size variable that is fairly proportional to the study var iab l e , f rame order by sizes leads to a fairly smooth trend. Other ordering principles are also used. In many survey contexts it is natural to use geographical order in the sampling frame, which leads to a desired geographical spread of the sample. Gener-- ally, we speak of a sampling frame w i t h f x e d order, one possibility being 'ordered by size:'. As indicated, fixed frame order is often beneficial for estimation precision, but it

violates (1.6). Point estimator precision is 'bought' to the price that +control' over variance estimation is lost.

Page 4: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

162 B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159-191

One way to cope with the dilemma, which is used in practice, is to order the frame at random. The sampling scheme comprises then first a totally random ordering of the frame and secondly selection of a systematic rips. This scheme, called systematic rips with random frame order, is measurable. It is a bit unclear, though, how well it complies with (1.7) and (1.8). Approximate variance estimators exist; however, notably the Hart ley-Rao estimator, which yields non-negative estimates (see e.g. Wolter, 1985, Section 7).

Sunter (1977) introduced a nps scheme, henceforth called Sunter rips, with control of variance estimation. In particular, an algorithm for computation of rcij was presented, and it was shown that hi. nj - nij > 0 holds. It should be noted, though, that Sunter's scheme yields only approximate rips, in the sense that some of the original size measures have to be modified before the scheme can be set in operation, and nps refers to the modified sizes. A consequence of the modification is that a portion of the list is sampled by simple random sampling, and this portion is larger, the higher the sampling fraction is. Section 3.6.2 in S~irndal et al. (1992) gives an account of the rips problem. In particular they present Sunter's scheme on pp. 93-96, to which the reader is referred.

1.3. Outline of the paper

Our chief objective is to show that the order sampling schemes, which are introduced and studied in Ros6n (1997), can provide interesting contributions to the rips problem. To make the paper self-contained, basic concepts and results for order sampling are briefly reviewed in Section 2, where also some extensions are made. In particular, the notion of order sampling with fixed distribution shape, abbreviated OSFS, is intro- duced. In Section 3 this class of sampling schemes is employed to construct a class of (approximate) nps schemes, called OSFSnps schemes. Moreover, it is shown that a particular OSFSnps scheme, called Pareto rips, is (asymptotically) optimal among the OSFSnps schemes, to the effect that it minimizes estimator variances. Section 4 contains a deep study of Pareto nps.

The last section (Section 5) presents evaluations and comparisons of rips schemes, which comprise three OSFSnps schemes; Poisson rips, exponential rips and Pareto rips, and three nps schemes outside OSFSnps; Sunter nps and systematic rips, with frame ordered at random, respectively, by sizes. The main reason for including systematic rips is that it is widely used in practice. As regards Sunter rips, we read S~irndal et al. (1992) so that they indicate Sunter nps to be a candidate for 'best so far' among rips scheme with variance estimation control.

The conclusions from the evaluation are as follows. Pareto rips is superior among the considered rips schemes, relative to all three requests (i)-(iii) in Section 1.2. However, if (iii) is disregarded and one is prepared to lose control over variance estimation, systematic rips with frame ordered by sizes leads often, but not always, to better point estimator precision than Pareto nps. It is a matter of judgment/beliefs for the sampler to chose between the two schemes.

Page 5: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n/Journal of Statistical Planning and Inference 62 (1997) 159 191 163

2. On order sampling

As a preparation for the subsequent approach to the nps problem we review, and also extend, some concepts and results on order sampling, presented in Rosen (1997).

2.1. Review of basic concepts and results" on order sampling

Definition 2.1. To each unit i in U = (1, 2, . . . , N) a probability distribution Fi on [0, vc ), with densityf~, is associated. Order sampling from U with sample size n, n ~< N, and order distributions F = (F1, F2, . . . , Fu), denoted OS(n; F), is carried out as fol- lows. Independent random variables Q1, Q2 . . . . . QN, called ranking variables with distributions F1, g2 . . . . ,FN are realized. The units with the n smallest Q-values constitute the sample.

It is obvious that OS is wor. Moreover, since the Ffs are assumed to be continuous, the realized Q-values contain (with probability 1) no ties. Hence, OS(n; F) has fixed sample size n.

Proposition 2.1 below, which in essence is a restatement of Approximation Result 3.2 in Rosen (1997), contains key estimation results for general order sampling schemes. The core results in Rosen (1997) are stringent limit theorems. However. from a sampling practical point of view limit results are not so interesting, since practical situations are always 'finite'. The main merit of limit theorems is that they lay ground for approximations which can be employed in practical, finite situations, and the approximation aspect is emphasized in the following. A reader with particular interest in precise limit theorem versions of the results to be formulated is referred to Rosen (1997) for additional information. We use the (admittedly sweeping) expression that a result holds 'with good approximation under general conditions' to indicate that there is a corresponding limit theorem, which states that the claim is asymptotically true (under appropriate, but general, conditions).

We prepare the formulation of Proposition 2.1 by introducing a quantity ~ and the order sampling estimator ~(y)o,, of z(y), as stated below:

N

solves the equation (in t, on 0 ~< t < oc ): ~ Fdt) = n, i = 1

(2.1)

N Yi

{(Y)os = ~ ~ . I , . (2.2) i = 1

Remark 2.1. A sufficient condition for a solution ff to (2.1) to be unique is thatJi(~) > 0 for at least one i, i -- 1, 2, ... ,N. See Remark 3.1 in Rosen (1997).

Remark 2.2, As discussed in Remark 3.3 in Rosen (1997), there are good grounds to believe that the following approximation formula for OS(n; F) inclusion probabilities

Page 6: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

164 B. Roskn /Journal of Statistical Planning and Inference 62 (1997) 159-191

=~ holds with good approximation under general conditions:

lti ~ Fi({), i = 1, 2, ... , N. (2.3)

Condition (2.3) is only a conjecture, though, it has not been justified by a stringent limit theorem.

The following proposition tells that (2.2) is a good estimator under general condi- tions. As already stated, Proposition 2.1 is in essence a restatement of Approximation Result 3.2 in Ros6n (1997) to which the reader is referred for justifications. N(kt, o2) denotes the normal distribution with mean/~ and variance o 2.

Proposition 2.1. Consider OS(n;F) sampling from the population U, on which the variable y is defined, and let notation be as in (2.1) and (2.2). Then (a)-(d) hold under general conditions:

(a) The distribution oJ'¢(Y)os is well approximated by N(z(y), ~2), with (2.4)

N - 1 - 7 Fi(~)(1 - F,(~)), (2.5) i = l

where

N yi .f ,(O f,(~). (2.6) i - i =

(b) In particular, ~)os yields consistent estimation of ~ ) . (c) In particular, a 2 yields good approximation of V['~O')os]. (d) A good variance estimator for ~)o~ is provided by

n - 1 ~:1 \ F ~ - ~ (1 -- F,({)).I~, (2.7)

where

'9 = ~, y~ / N f~(~) i=1 ~ ' f / ( ~ ) " Ii i~= l ~ " Ii" (2.8)

2.2. A uniqueness issue for order sampling

Definition 2.1 states that a pair (n; F), F = (F1, F2 . . . . . FN), determines an order sampling scheme. However, there is no one-to-one correspondence between OS schemes and pairs (n; F). To realize that, consider a function (p on [0, Go ) which is strictly increasing, continuous and has range [0, oc). Since an increasing function preserves order, the following holds.

Two OS schemes with ranking variables Q1, Q2 . . . . . QN and Q*, Q* . . . . . Q*, which are related by Q* = cp(Qi), i = 1, 2, . . . , N, are (probabilistically) equivalent. (2.9)

Page 7: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n / Journal of Statistical Planning and Ir~ference 62 (1997) 159-191 1 6 5

The corresponding order distributions F = (F1, F 2 , . . . , FN) and F* = (F*, F*, . . . , F*) are related by F*(t) = Fi(~o 1 (t)), i = 1, 2, . . . , N, where q~-1 denotes inverse function. Hence, (2.9) can be expressed as follows, which shows that there is no one-to-one correspondence between OS schemes and pairs (n; F).

For a continuous, strictly increasing ~p, the schemes OS(n; F) and OS(n; F((o 1)) agree. (2.10)

2.3. Order sampling with f ixed order distribution shape

Here we introduce a subclass of OS schemes which will play an instrumental role in the subsequent considerations of the rips problem.

Definition 2.2. Let H(t) be a probability distribution with density h(t), t>~ O,

and let 0= (01 ,02 , ...,ON), 0 i > 0 , be real numbers. An OS(n;F) scheme, F = (F1, F 2 , . . . , FN), is said to have f i xed order distribution shape H(t) and intensities 0, if either, and hence both, of the following two equivalent conditions are met.

0) The ranking variables Q1,Q2 . . . . . QN are of type Qi = Zi/Oi, i = 1,2 . . . . . N, where Z1, Z2, . . . , ZN are independent, identically distributed (iid) random variables with common distribution H. (2.11)

(ii) The order distributions are Fg(t) = H( t . 0~), 0 <<. t < ~ , i = 1, 2 . . . . . N. (2.12) Such a sampling scheme is referred to by the notation OSFS(n; H; 0).

The following particular OSFS schemes are given special attention in Ros6n (1997). As regards sequential Poisson sampling, we also refer to Ohlsson (1995).

Definition 2.3. (a) Sequential Poisson sampling is OSFS(n; H; 0) with H being the standard uniform distribution; H(t) = rain(t, 1), 0 ~< t < ~ , and h(t) = 1 on [0, 1] and 0 outside [0, 1].

(b) Successive sampling is OSFS(n; H; 0) with H being the standard exponential distribution: H(t) = 1 - e -t and h(t) = e -t, 0 <~ t < vo.

3. OSFS schemes and the :~ps problem

Here we employ OSFS schemes to treat the rips problem in Section 1.2. In the first round we presume that the shape distribution is given, and we seek intensities which lead to inclusion probabilities that are equal to, or at least close to, the target inclusion probabilities. In a second round, we regard the shape distribution as optional, and consider optimal choice of it.

3.1. The lzps problem when the shape distribution is given

Let ,t = (21,/~2 . . . . . /~N) and H be given. We consider the following problem.

Page 8: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

166 B. Rosdn / Journal of Statistical Planning and Inference 62 (1997) 159-191

Problem. Find intensities 0 such that the inclusion probabil i t ies ~r = (nl , nz, . . . , nN)

for the scheme OSFS(n; H; 0) agree with the target probabi l i t ies 2 = (21,22, . . . , 2N).

We start with a heuristic t rea tment of the problem, in which the app rox ima t ion (2.3)

is regarded as an exact relation. Then, by combin ing (2.3) and (2.12) we obtain the relations: n g = H(~-0~), i = 1, 2, . . . , N . Thus, the desired equali ty ~h = 2i holds if intensities 0~ and target values 21 are related as follows: 2~ = H(~ . 0g), i = 1, 2, . . . , N. By solving for 0i we get, where H - ' denotes inverse function: The intensities Oi = H - 1 (2~)/~, i = 1, 2 . . . . . N, provide a solution to the above problem. Moreover , an O S F S scheme is invar iant under rescaling of the intensities (i.e. by mult iplying them with the same posit ive number) . This is seen f rom (2.10) with rp(t) = c- t, c > 0. Hence, an al ternat ive solution to the p rob l em is 0~ = H - 1 (2i), i = 1, 2 . . . . , N.

We use the last fo rmula to const ruct ranking variables Qg for an OSFS scheme. By

(2.11), Qi = Zi/Oi, i = 1, 2, ... , N , where Z I , Z z . . . . , ZN are iid r a n d o m variables with c o m m o n dis tr ibut ion H. Note that such Z-var iables can be generated as

Zl =H- I (u I ) , i = 1,2, ... ,N, (3.1)

where

U1, U2 . . . . , UN are independent s tandard uniform r a n d o m variables (see Defini t ion 2.3). (3.2)

The reby we have arr ived at the following heuristic solution: The OS scheme with ranking variables Qi = H - I (U~) /H- 1(2~), i = 1, 2, . . . , N (which is an OSFS scheme) has inclusion probabi l i t ies ~. = (21, 22, . . . , 2N). Against this background we m a k e the following formal definition.

Definition 3.1. The OSFSTrps scheme with sample size n, target inclusion probabil i t ies

,~ = (21,22 . . . . . 2N) and shape dis tr ibut ion H, in full denoted OSFSnps(n; ~,; H), is the OS scheme which satisfies either, and hence both, of the following two equivalent conditions. In part icular , the condi t ions imply that the scheme is an O S F S scheme.

(i) Rank ing variables Q1, Q2 . . . . , QN are

H - 1 (U~) Qi - H - 1 ( 2 1 ~ ' i = 1, 2 . . . . , N with U's as in (3.2). (3.3)

(ii) Orde r dis t r ibut ions F = (F1, F2, . . . , FN) are

Fi(t) = H ( t . H - 1(1,)), with d e n s i t y f ( t ) = h(t . H - 1(2,)). H - 1 (2~), i = 1, 2, . . . , N.

(3.4)

Proo f o f the equivalence between (i) and (ii): With ranking variables as in (3.3), we have

F,(t) = P(Qi <~ t) = P ( H - I ( U , ) / H - 1 ( 2 ~ ) <~ t) = P(U~ <~ H ( t . H - 1 ( 2 , ) )

= H ( t . H - I ( 2 , ) ) , i = 1,2 . . . . ,N,

which is the formula for Fi(t) in (3.4). The densi tyfi( t ) is obta ined by differentiation.

Page 9: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Rosbn /Journal o f Stat&tical Planning and Inference 62 (1997) 159 191 167

Since OSFSnps is a special type of OS scheme, the results in Proposit ion 2.1 can, and will, be applied. We start by exhibiting expressions for the 3, F(~) andf (~) which enter in Proposit ion 2.1. By (3.4), Eq. (2.1) here takes the form

N

5", t ; ( t . H - ~ ( ; , 3 ) = n. (3.5) / = 1

By virtue of (1.4), (3.5) is solved by ~ = 1. This together with (3.4) yields

F,(~) = F~(1) = H(H-'(2,)) = 2~ and f,(~) =f~(l) -- h(H ~()./)).H-~().~),

i = 1, 2 . . . . . N. (3.6)

Now, by applying Proposi t ion 2.1 with F,(~) = 2,, andf~(~) = h(H 1(2,)). H - ~(2,), the following Approximation Result 3.1 is obtained. Some algebra is left to the reader.

Approximation Result 3.1. Consider OSFSnps(n; 2; H) sampling from the population U, on which the variable y is defined. Define the estimator ~(y)a of z(y) by

N Yi . ~(y)x = ~, ~-1 , . (3.7)

/ = 1

Then (a)~d) below hold under general conditions:

(a) The distribution of "~(y)a is well approximated by N(v(y), O'2~; H; ~)), with

(3.8)

N Yi YJ'~J aj .,ii(1 - )ti), (3.9) ~2(y; ~; H) - N - 1 L - ~-=~-J/j= 1 i = l j = l

where

cq = cq(~; H) = h(H-l(2i)).H 1(20, i = 1, 2 . . . . . N.

(b) In particular, ~(y)z yields consistent estimation of z(y). (c) In particular, V[-¢0')x] is well approximated by a2~; ~; H). (d) A good variance estimator for ¢(y)z is provided by

(3.1o)

where

a, = a,Gt; H) = cq(;t; H)/;, = h(H- '(2,)). H - 1 (2,)/~,, i - - 1 , 2 , . . . , N .

(3.12)

Remark 3.1. In the heuristic reasoning which precedes Definition 3.1, the approxima- tion (2.3) plays a crucial role. However, even if (2.3) is conjectured to be a good approximation, its goodness is an open question. We want to stress, though, that j ustification of Approximation Result 3.1 is given by Theorem 3.1 in RosSn (1997), and

V [ ' ~ ( . V ) ~ ] rl - - l i = l k, A i i = 1 '1 / i = l /

Page 10: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

168 B. Roskn /Journal of Statistical Planning and Inference 62 H997) 159-191

this theorem does not rely on the approximation (2.3). Hence, the claims in Approx- imation Result 3.1 hold asymptotically under the general conditions in Theorem 3.1 in Ros6n (1997), irrespective of whether (2.3) is a good approximation or not.

Next we introduce terms and notation for the OSFS~ps schemes which relate to the sampling procedures in Definition 2.3. Our general terminological rule is that OSFS~ps (n; 2; H) is named by its shape distribution H. We start, though, with an exception from the rule. Poisson tops, denoted POIrcps(n; 2) or just POIrcps, and exponential ~ps, denoted EXPTcps(n; 2) or just EXPrcps, are the OSFS~ps(n; 2; H) schemes with H being standard uniform distribution and, standard exponential distribution (see Definition 2.3) respectively.

Remark 3.2. By the above terminology rule, Poisson ~ps ought to be named 'uni- form ~ps'. A reason for diverging is that H being the standard uniform distribu- tion (UNIF) has the special property H ( t ) = H - l ( t ) (on its support [0,1]). Hence, OSFS~ps(n; 2; UNIF) = OSFS(n; UNIF; 2). Since Ohlsson (1995) calls OSFS(n; UNIF; 2) 'sequential Poisson sampling', we keep Poisson, but leave out sequential for brevity throughout this paper. We are aware, though, that confusion may arise in more general contexts, since 'Poisson sampling' is already booked for another scheme, see e.g. Sarndal et al. (1992, pp. 85-87).

Verification of the following 1emma is left to the reader.

L e m m a 3.1. (a) For POITrps(n; 2), ~, a, Q's and G 2 in (3.10), (3.12), (3.3) and (3.9) are

~i = 2i, ai = 1, Qi = U i / ) ~ i , i = 1, 2 . . . . . N,

N Yl - - n J O'207; 2; POI) -- N -- 1 ~ y~ -2i(1 - 21). i=l "=

(b) For EXP~zps(n; 2), ~t, a, Q's and a 2 in (3.10), (3.12), (3.3) and (3.9) are

- l n ( 1 - 2 , ) . ( 1 - 2 , ) l n ( 1 - U,) ei = -- ln(1 -/~i)" (1 - ).i), a i = ~'i , Qi - ln(1 - - , ~ i ) '

i = 1 , 2 , . . . , N ,

a2(v; 2; EXP) - - -

(3.13)

(3.14)

N N - - 1

(3.15)

Yi y~v= lyjln(1 -- 2j).(1 -- 2j)/2;~2.21( 1 _ 2/). (3.16) ~' 2/ /---~;=11n(1 2j) ( 1 - 2j,) / i=1

R e m a r k 3 . 3 . The associated variance estimators are obtained by inserting the a's in (3.11).

3.2. Optimal shape distribution

The point estimator ~(y)z in (3.7) is the same for all shape distributions H. The variance of "~(.v)x depends on H, though, as seen from (3.9) and (3.10), and exemplified

Page 11: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n / J o u r n a l o f Stat is t ical P l a n n i n g and lr~ference 62 (1997) 159 191 169

in (3.14) and (3.16). Therefore, it is natural to wonder about H's which minimize the asymptotic variance a2(x; ~,; H) in (3.9). The answer is stated below.

Theorem 3.1. The asymptotic variance o-2~y; it; H) in (3.9) is minimized, for any study variable y and any it, by the shape distribution

t 1 H(t) - 1 + t' 0 ~ t < oc with density h(t) - (1 + t ) 2 ' 0 ~ t < oC. (3.17)

Definition 3.2. The distribution in (3.17) is called the standard Pareto distribution, and the associated OSFSnps scheme Pareto nps, denoted PARTcps(n; it) or just PARnps.

Remark 3.4. The fact that Pareto rips minimizes a2fy; it; H) for anyy and it is referred to by saying that Pareto nps is (asymptotically) uniformly optimal among OSFSnps schemes.

Proof of Theorem 3.1. The algebraic details in the following proof are left to the reader. We shall employ the following representation of a2(y; it; t l) in (3.9):

620'; it; H) = M(y; it) + R[y; it; H), (3.18)

where

m(y; i t ) - NN-- 1 )" ,vj(1 - - ,).j)/ 2j(1 -- 2j) "2i(1 -- 2i), (3.19t i=1 j = l ,' j = l

N N Y i ~ i /.i(1 - - /~i) ~ 2 Z ;.i(1 - - /~i)" (3.20}

R l Y ; i t ; H ) = N - 1 i=1),3 ZT-Iscj E72~;~-~--2;)Jf i=

The representation (3.18)-(3.20) is obtained by using the following identity:

• = ~ . = / i=1 i = l y j 1 i=1 i=1 i . -1

(3.21)

with zi = Yi/2i,

~Q / N Yi" ~i /

7 =i=i )~-/i/i__~1~i

and fii = 2i(1 - 23. N/ (N -- 1), and comparing with (3,9). Now, the following claims (i)-(iii) are straightforward consequences of (3.19) and

(3.20): (i) M(y; it) does not depend on H, (ii) R(y; it; H) ~> 0 for all H, (iii) R(y; it; H) = 0 if el is proportional to 2all -- 2z). By combining (3.18), (i)-(iii) and (c) in Approximation Result 3.1, the following result is obtained.

Page 12: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

170 B. RosOn /Journal o f Statistical Planning and Inference 62 (1997) 159-191

Lemma 3.2. For a f ixed sample size n, OSFSrcps(n; ~; H) has, for any y, minimal asymptotic variance a2(y; ~,; H) for ?(y)a among OSFSnps(n; ),) schemes if

cq(v; H) in (3.10) is proportional to 2i(1 - 2i). (3.22)

Next we search an H which makes (3.22) satisfied. Relations (3.22) and (3.10) lead to the following functional-differential equation, k being a positive constant:

h(H 1(2)).H 1 ( 2 ) - k . 2 ( 1 - 2 ) , 0 < 2 < 1 ,

where h(t) = H'(t) (' for derivative). (3.23)

By inverting (3.23) (with respect to multiplication) and using the general differenti- ation rule dy-1(2)/d2 = 1/y'(y-l(2)) , the following differential equation is obtained:

dH 1 (,~)/d~, d[ln H - 1(2)]

H 1(2) d2

l l d [ - 1 2 -] - k.2(1 - 2) (by (3.23)) = k ~ L ni - -Z~]" (3.24)

It is readily deduced that the probability distributions H which satisfy (3.24) are specified by

{ ,t ,~l/k H - 1(2; k; c) = c ~ - ~ - ~ j implying H(t; k; c) - - -

0 ~ < t < ~ , k , c > O .

C t k

] + c t k'

(3.25)

In the first round (3.25) seems to provide quite an extensive class of solutions to (3.22), which is not the case, though. From (3.3) and (3.25) it is seen that the associated ranking variables Q(k; c) are

H l(Ui; k; c) (Ui(1 - - ~-i)x~ 1/k Q(k; c)i - H_ l(21;k; c) = \2i(---1 2-Ui~J ' i = 1, 2 . . . . ,N. (3.26)

Since the function cp(t) = t ilk, 0 ~< t < 0% k > 0, has range [0, oo ), is increasing and continuous, (3.26) and (2.9) yield that all OSFSnps(n; 2; H) schemes specified by (3.26) are equivalent. The 'standard' version with c = k = 1 is chosen in Theorem 3.1 and Definition 3.2.

Remark 3.5. Eqs. (3.23)-(3.26) yield a constructive derivation of Pareto nps as solu- tion to (3.22). If the reader should question the rigor in that derivation, one can argue instead as follows. Formula (4.1) tells that PAR(n; ~) satisfies (3.22). Once a shape distribution H that makes (3.22) satisfied is exhibited, Lemma 3.2 'directly', and rigorously (but non-constructively), tells that no other OSFSnps scheme has smaller asymptotic variance a2(y; 2;/4).

Page 13: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n /Journal o f Statistical Planning and Inference 62 (1997) 159 191 171

4. Pareto ~ps

The (asymptotic) optimality property stated in Theorem 3.1 gives Pareto rips (see Definition 3.2) particular interest. In this section we provide further information on Pareto rips.

4.1. Properties o f Pareto rips

3-'he algebra relating to (4.1)-(4.3) is left to the reader. Recall relation (1.4).

Lemma 4.1.

3~ i = 2 i (1 - - )[i),

For PARnps(n;).), ~, a, Q's and ~2 in (3.10), (3.12), (3.3) and (3.9) are

U i(1 -- .~,i) a i = 1 -- )oi, Q~i - i = 1, 2 . . . . ,N , (4.1)

~i(1 - - U i ) '

O'2(y; ~; PAR) - N

N - - I (yi 2 ) ' L , y ; ( l - ; . , ( l - ; , )

i = l . =12j(I 2 j ) ]

- - 1 ~ i = 1 ) ' i - - Z....i= 1 ~i )

The corresponding variance est imator (3.11) is

(4.2)

n (y~ E ~ ( 1 - - ~ j ) / ) ~ j \ 2 1 PU(v)d ,ii)" Ii. (4.3)

Remark 4.1. By the Lagrange equality version of Schwarz inequality, (4.2) can be rewritten

a 2 ( y ; ; t ; P A R ) - 2 ( ~ L 1);_ ~ 2 i - 2 j ( 1 - 2 3 ( 1 - 2 ; ) ~ - 2 j j / \ l j = l , j = l

(4.4)

Remark 4.2. The uniform asymptotic optimality of PARnps (see Remark 3.3) for estimation of population totals z(y) entails that PARnps is asymptotically optimal also for estimation of other population characteristics. In the discussion of this matter we use the following conventions. Arithmetic operations on variables are to be interpreted componentwise, e.g. for x = (Xl, x2 . . . . ,xN) and y = (Yl, Y2, . . . , ) ' . ~ , ' ) ,

x . y = (xl "yl, . . . , xN. YN). Moreover, optimal stands for minimizing estimator vari- ance'.

Consider estimation of a ratio i~(y; x) = z(y)/z(x). By standard principles, #(y; x) is estimated by fi(y;x)x ='~(y)a/'~(x)a. Taylor linearization leads to the asymptotic variance formula V [/~(y; x)~] ~ V [ £(y - #(y; x) . x) x]/z(x) 2. Hence, minimization of V [/~(y; x)~] is asymptotically equivalent to minimization of V [~(y - ~(y; x).x);.], i.e. minimization of the variance for the estimator of the total o fx - / B y ; x)-y. Since, by

Page 14: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

172 B. Rosbn / Journal of Statistical Planning and Inference 62 (1997) 159-191

Theorem 3.1, Pareto rips effectuates minimization uniformly (irrespective of the study variable), it is asymptotically optimal also for estimation of ratios.

The optimality extends also to estimation of domain characteristics (totals and ratios). To illustrate, we consider estimation of a domain ratio #(y;x;D) = z(y; D)/z(x; D), where z(y; D) denotes the y-total over domain D, which can be

written z(y; D) = z(y. 1D), where lo is the domain D indicator. The previous reasoning leads to fi(y;x; D)x = ~(y; D)z/~(x; D)a with asymptotic variance V[/i(y;x; D)x]

V[f ( [y - -12(x;y ) .x] . lD)x] / r (x . lD) 2. Theorem 3.1 tells again that PARnps is asymptotically optimal.

4.2. A limit theorem

Theorem 4.1. Consider, for k = 1, 2, 3 . . . . . PARnps(nk; 2k) sampling from the popula- tion Uk = (1, 2 . . . . . Nk) , on which the variable Yk - - - - - (Ykl , Yk2, . . . ,YkN~) is defined, with total Z(Yk). Let Z(Yk)~ be in accordance with (3.7). Then, under conditions (C1)-(C3) below, (4.5) holds for 6k defined in accordance with (4.2):

[ZO'k)~ -- Z(yk)]/ak converges in distribution to N(O, 1) as k ~ ~ .

(C1) n k ~ , a s k ~ . I N~

(C2) l i m - - ~ 2~g<l. k ~ nki= 1

l Yki ~[;k=l Ykj( 1 -- ~kj) I (C3) m a x - I ~ 0 a s k ~ .

(4.5)

Remark 4.3. Loose interpretations of (C2) and (C3) run as follows. In order that normal approximation of the distribution of ~(Yk)~ shall be good, a larger nk is required: (i) the closer to 1 some target inclusion probabilities lie, (ii) the more 'dispersed' the sampling situation is, where 'dispersed' relates to the of spread of {ygi/2kl ; i = 1, 2, . . . , N}.

Proof of Theorem 4.1. If we show that conditions (C1)-(C3) imply fulfillment of conditions (BI)-(B5) in Theorem 3.1 in Rosbn (1997), the result follows from that theorem. First note that (B1) and (B2) in fact are the same as (C1) and (C3). To verify (B3), note that (3.6) yields

fki(~k) = ),ki(1 -- 2ki), i = 1, 2, . . . , Nk. (4.6)

Condition (B3) concerns the asymptotic behavior of the quantity

1Nk 1N~ i=~=1 f ( ~ki) = --rig ix ~1 ,'tki (1 -- 2ki) (by (4.6))

1 N~ = 1 - - - ~ ).k2~ (by (1.4)). (4.7)

nk i=1

Page 15: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159 191 173

From (4.7) it is readily seen that (C2) implies (B3). The verifications of (B4) and (B5) which are straightforward, are left to the reader.

4.3. Operational formulation o f the Pareto rips sampling-estimation scheme

Below we list the steps in practical application of the PARnps sampling-estimation procedure. Justifications are given by the results presented so far.

The Pareto ~ps sampling-estimation procedure 1. A sampling frame U = (1,2 . . . . . N) with .sizes s = (sl ,s2 . . . . . s~,), si > 0, is at

hand, and a sample size n is specified. 2. Compute the target inclusion probabilities 2~ n. , j ~ j= ~ sj, i = 1, 2 . . . . . N.

It is presumed that 2~ < 1, i = 1,2, . . . ,N. If not, introduce a take all stratum or modify the size measures.

3. Realize independent standard uniform random variables U1, U2, . . . , UN, there- by realizing the ranking variables

U i( 1 -- )~i) Qi-- jq(1 -- Ui)' i = 1,2 . . . . ,N.

The sample consists of the n units with labels (J~, J2, . . . , Jn) that are determined by Q j,, Qj . . . . . , Q j , , being the n smallest values among the realized Q1, Q2 . . . . . QN.

Variable values y and ).for sampled units are denoted (Y~, A,.), v = 1, 2 . . . . . n. 4. The population total z(y) = Yl + Y2 + " ' " + YN, is estimated by

~(y)~ = ~ Y,,/Av. (4.8) v = l

5. The variance of'~{y)a is estimated by

n (Y,. _ 2;=1Yp(1 - Ao)/Ao~ 2

n - 1~=~ ~ ~,, E;=l(1 Ap) J ?U~)A (1 A0. (4.9)

6. An approximately 95% confidence interval for ~(y) is

~(YL +- 1.96 Vxf~-[~-(y)a]. (4.10)

Remark 4.4. Below we state a version of (4.9), which has computational merits:

n B2/C), ? U ( v h ] = n - 1 ( A - (4.11)

where

(Yv)2(1--A~), B = ~ Y~ ' v = l v = l v = l

(4.12)

Page 16: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

174 B. Ros~n / Journal of Statistical Planning and Inference 62 (1997) 159-191

Remark 4.5. As indicated in Remark 4.2, point and variance estimators for other population characteristics than totals can be obtained by appropriate modifications of Steps 4-6.

Remark 4.6. The general structure in the above formulation of the Pareto ~ps sampling-estimation procedure applies to all OSFSzcps schemes. Differences between specific schemes concern only the definition of the ranking variables Q (in Step 3), and the a-values in (3.12), which in turn affect the variance estimator (3.11), which for Pareto rcps takes the form (4.9).

Remark 4.7. From (4.9) it is seen that variance estimates for Pareto rcps always are non-negative. More generally, it is seen from (3.11) that this holds for any OSFSrcps procedure.

Remark 4.8. Ohlsson (1995) emphasizes the following attractive feature of Poisson tops. If the frame contains overcoverage (out-of-scope units), a 'quasi-Poisson zrps' sample from the (imaginary) list of in-scope units can be selected by the following procedure. Units which turn out to be out-of-scope are excluded from the sample and replaced by units with smallest ranking variable values among so far 'unused' units, until a size n sample of in-scope units is obtained. In this process one loses the full control of the inclusion probabilities, though, since the size sum over the in-scope list is not known. However, if the aim is to estimate a ratio #(y; x) = ~(y)/~(x), this does not matter, since the unknown size sum in nominator and denominator will cancel. (If the aim is to estimate a total over the in-scope units, the size sum should be estimated.) This type of substitution for overcoverage can be employed for any OSFSnps scheme, i.a. for Pareto nps.

Remark 4.9. Another attractive aspect of Poisson ztps which is also emphasized in Ohlsson (1995) is as follows. If the Ui's in the ranking variables Ui/2i are permanent random numbers, a new Poisson nps sample from an updated version of the frame is positively coordinated ( = has big overlap) with the previous sample. Negative coord- ination (also for simultaneous samples), is obtained if Ui is exchanged for 1 - Ui. From (3.3) is seen that any OSFSrcps scheme has these coordination properties and, hence, also Pareto ~ps.

Remark 4.10. The above Pareto sampling-estimation procedure is formulated for the ideal case when all sampled units respond. However, in virtually all practical surveys non-response occurs. How to cope with non-response situations depends, of course, on what non-response model is judged to be realistic in the particular situation. Below we suggest an adjustment technique for Pareto nps, under the non-response model that the population can be partitioned into disjoint groups G1, . . . , G 9 . . . . . which are (fairly) homogeneous with respects to 2-values as well as response

Page 17: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Rosbn / Journal of Statistical Planning and Inference 62 (1997) 1.59 191 175

propensities. Then, use the following adjustment procedure (which will be better

motivated elsewhere).

Within each group Go, recalculate 2-values by 21 = 2i. n'o/n o, where n o and n' o denote the numbers of sampled, respectively, responding units in group G o. Then carry out Steps 4 and 5 within each group Go, using 2 = ,~' and n = n~, which leads to estimates of group totals with variance estimates. Finally, add group total estimates and variance estimates for group totals, to obtain an estimate of the population total with a variance estimate. Note that it suffices to re-calculate 2's for sampled units. (4,13)

5. Evaluation of rips schemes

5.1. Introduction

Even if Pareto rips has the attractive property of being asymptotically uniformly optimal among OSFSgps schemes, it cannot be recommended right away. Below we list some additional issues that should be addressed. - Asymptotic optimality among OSFSnps schemes does not say anything about how

well Pareto gps compares with rips schemes in general. There may be better schemes outside the OSFSnps class. For a global evaluation (i.e. over all rips schemes), one would like to compare Pareto gps with the best rips scheme outside OSFSgps. This is a somewhat intricate wish, though, because there is no rips scheme which is unanimously regarded as 'best so far'. However, as already stated, we read Siirndal et al. (1992) so that they suggest that Sunter nps is a candidate for the title. The most widely used nps scheme in practice, systematic rips, is of course also of interest.

• The asymptotic optimality of Pareto rips was justified by using Approximation Result 3.1. As the name states, this result involves approximations; regarding unbiasedness of the point and variance estimators, the theoretical estimator vari- ance formula, normal distribution of estimators. Although the approximation errors are asymptotically negligible, adequate practical application of an OSFSnps scheme requires that the approximations are 'good enough' in the particular finite situation. Unfortunately, it is unfeasible to exhibit theoretical error bounds, from which it can be read off if the approximations are good enough in a particular situation. What can be done, though, to get an idea of the approximation errors is to carry out simulation studies.

• In addition to knowing that Pareto rips is (asymptotically) optimal among OSFSnps schemes, one wants a quantitative idea of its superiority over other OSFSnps schemes.

• The optimality property of Pareto nps relates only to (1.6) (ii). The scheme should also be judged relative to (i) and (iii).

Page 18: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

176 B. Roskn /Journal of Statistical Planning and Inference 62 H997) 159-191

5.2. Theoretical considerations for OSFSnps schemes

Being (asymptotically) uniformly optimal, Pareto nps plays a distinguished role. Therefore, we use the performance measure variance increase (VI) relative to Pareto rips. The parameters n, 2 and y are presumed to be the same for compared schemes:

Estimator variance under rips scheme S VI(scheme S) = - 1. (5.1)

Estimator variance under Pareto rips

Computation of exact VI-values is unfeasible, and we will resort to approximate ones:

Approximate estimator variance under nps scheme S AVI(scheme S) - - 1.

Approximate estimator variance under Pareto nps

(5.2)

In the first round we confine the comparisons to OSFSnps schemes, using (5.2) with approximate estimator variances by formula (3.9). Then the asymptotic optimality of Pareto nps entails that AVI-values are non-negative. By (3.18) and (4.2), AVI for OSFS~ps(n; 2; H) is, with M(x; 2) and R(x; 2; H) according to (3.19) and (3.20)

R 0'; ~.; H) AVI(y; 2; H) - Mfy; 2) ' (5.3)

Even if (5.3) gives an explicit expressions for AVI it is difficult to look through it, to see how it depends ony, ;. and H. To make things a bit more transparent, we specialize to AVI for Poisson nps. By combining (3.14) and (4.2) we get the following expression for Poisson nps AVI, after some straightforward algebra which is left to the reader:

/ / 1 N

AVI(y; 2; POI)= I~¢=lY'I)~'..2 -- n~'~J= 1 '~2))2 (5.4)

( n - Z ~ = '~2~vN 1 ~ - - x ~,~,= (1 2,) (E~= x y,(1 2,)) 2

Formula (5.4) is still quite difficult to look through, though, and we specialize it further by considering a particular type of sampling situation.

For a ~> 0, let the study variable values be Yi = i a, i = 1, 2 . . . . , N, and let the target inclusion probabilities be proportional to i, i = 1, 2, . . . ,N. Then, after straightfor- ward (but cumbersome) calculations one arrives at the following asymptotic (as N ~ oo ) expression:

( N ) 2 16a / ( l _ ( N ) 4 ( 5 a + 2 ) ( a + l ) AVI(POI) = 9 ( a + - 2 ) 2 / \ 3(2a + 1)(a + 2)

,6a a+1,2 3(2~ ~ i ) ~ ~ 2)2,] . (5,5)

Page 19: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n /Journal (?f Statistical Planning and lnjerence 62 (1997) 159-191 177

Two special cases of (5.5) are listed below:

n 4a(2a + 1) F o r ~ = 0 . 5 A V I ( P O I ) = 3 ( a 2 + S a + 4 ) ,

() 2 [ n \ Z / / 4 n F o r a = 2 A V I ( P O I ) = ~ ) / / ~ 1 - 2 . ~ + 1 . 2 .

(5.6)

15.7)

Remark 5.1. We do not know the maximal AVI for Poisson ~ps (or for any other OSFS~ps scheme). By letting a --, oc in (5.6), it is seen that maximal AVI(POI) is at

least ~ = 267%.

Even if(5.6) and (5.7) relate to situations which are a bit extreme, we will use them to draw tentative conclusions on how AVI depends on the sampling situation.

The ideal ~ps sampling situation is when y- and s-values are exactly proportional. Then the population y-total is estimated without error. In this ideal situation, the values of (y, s) lie along a line through the origin. In practice, this situation never occurs, though, the (y, s)-values scatter more or less around a trend. The trend may be proportional ( = linear through the origin) or non-proportional. Type of trend as well as degree of scatter may vary considerably between practical sampling situations. I f 'powerful' auxiliary information is available, one may be in a situation with propor- tional trend and little scatter. In cases where only 'weak' auxiliary information is at hand, non-proport ionali ty as well as degree of scatter may be high.

i\ fact which reinforces variation in scatter and trend is that most surveys are multi-purpose (i.e. involve many study variables, and commonly also many domains). However, only one size measure can be used when selecting the sample. Therefore, in practical ~ps contexts one meets situations with good ~rps properties (as regards trend and scatter) for some of the study variables, but considerably less good for others. The usual way to cope with this di lemma is to choose the size measure with regard to the variable that is judged to be most important.

Against this background we look at formula (5.6). For a = 1 we are in a perfect ~ps situation with proport ional trend and no scatter. The corresponding AVI-value in (5.6) is a bit difficult to interpret, though, because it is a degenerate value of type 0/0. For a < 1 we are in situations with increasing concave trend (y grows relatively slower than s) and for a > 1 in situations with increasin9 convex trend (y grows relatively faster than s). In situations with a.fiat or decreasin9 trend, ~zps sampling is in fact non-favorable compared with simple random sampling. As is readily checked, AV! in (5.6) increases with a. A tentative conjecture is therefore that Pareto ~ps is more superior, the more convex the trend is. However, as will be seen later on, this picture is disturbed when scatter comes in, and we formulate the following looser version of the conjecture:

The degree of superiority for Pareto ~ps depends on the form of the (y, s)-trend. (5.8)

Page 20: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

178 B. Ros~n / Journal of Statistical Planning and Inference 62 (1997) 159-191

Next we turn to (5.7), which shows that AVI depends on the sampling fraction. It is readily checked that AVI in (5.7) increases from 0 to 55% when the sampling fraction increases from 0 to ½, which provides background for the following conjecture:

Pareto gps is more superior, the higher the sampling fraction is. (5.9)

A requirement on a good ~ps scheme is that it works efficiently also in situations with high sampling rates, for the following reason. Even if high overall sampling rates are unusual, high 'partial' sampling rates occur, in particular when gps sampling is used in conjunction with the following type of stratification. The population is stratified by study domains and 7rps samples are selected from the strata. Some strata may be comparatively small, which in combination with the fact that the precision of a stratum estimate depends mainly on stratum sample size (not on stratum sampling rate) may lead to high sampling rates in some strata.

5.3. Numerical comparisons

5.3.1. Generation of sampling situations Here we employ a more concrete way to indicate how AVI depends on the sampling

situation, by presenting numerical AVI-values for specific situations. As always in this type of context, one meets the problem that there is an 'ocean' of situations of potential interest, while the space for the paper is limited. We have to restrict, and then we let (5.8) and (5.9) be guiding.

We shall consider situations generated by the following sampling situation model for the relation between the study variable y and the size measure s:

si = i, yi = c(sT + a Z i ~ i ) , c > O, a >l O, the Zi being iid N(O, 1),

i = 1, 2 . . . . . N. (5.10)

The value of the parameter c does not affect the comparisons, we include it just to indicate the scope of the model. The vital parameters are N, a and a, and they will be varied. Variation of a and a can be interpreted as follows. Increase of a leads to increased scatter. When a moves away from 1, the trend moves away from propor- tionality, it is concave for a < 1 and convex for a > 1. To make the generated situations reproducible, we inform that the Z's were generated by the SAS6.10 function N O R M A L (seed) with first seed (for i = 1) = 555.

One may think that model (5.10) with a ~ 1 is uninteresting, since the situation can then be brought closer to the ideal ~ps situation by letting s a play the role of size measure. However, we mean that a ~ 1 is of interest, for at least two reasons: (i) Often, the sampler has only vague knowledge of the 0', s)-trend. He/she may judge it wrongly, and believe in proportionality while 'truth' is non-proportional (a ¢ 1). (ii) The fit between y and s was a secondary concern in the sample selection, s was chosen with regard to a more important study variable than y.

Page 21: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n / Journal of Statistical Planning and Inference 62 (1997) 159 191 179

Numerical AVI-values have been derived by two approaches: (a) using the asymptotic formulas for OSFSnps estimator variances, and (b) Monte Carlo simulations. Approach (a) leads to easy computations, but does not tell how well the asymptotic formulas approximate the true estimator variances. Approach (b) leads to considerably greater computation efforts but it gives, at least some, insight into the approximation goodness problem. Comparisons with Sunter nps are made by both approaches. An algorithm for computation of second-order Sunter inclusion probabilities is available. Having those, standard formulas yield estimator variances and variance estimates. Since Sunter nps lies outside OSFSnps, its AVI-values may be negative, implying that it performs better than Pareto nps. The performance of Sunter confidence intervals is studied by simula- tions. For systematic rips we resort to simulations.

Before presenting numerical results, we formulate the tentative conclusions we shall draw from them. We do not lay claim to give a complete picture of how AVI-values vary with the sampling situation. (To the best of our understanding, this is a very complex problem.)

Conclusions concerning the measurable rips schemes: OSFSnps, Sunter nps and systematic rips with random frame order (rfo). When rips is better than SRS, i.e. when the (y, s)-trend is fairly strongly increasing.

The OSFSnps schemes and systematic nps(rfo) all perform better than Sunter nps, and the more better the higher the sampling rate is. The advantage over Sunter 7cps is greater the more beneficial nps sampling is relative to simple random sampling. (5.11)

Pareto, exponential and Poisson rips compare as follows.

Under proportional (y,s)-trend: All three schemes have very similar perfor- mances. There is an edge for Pareto rips, but it is of little to no practical importance. (5.12)

Under non-proportional {y, s)-trend: Pareto nps performs better than exponential nps, which in turn performs better than Poisson rips. The differences are small when sampling rates are small, but may be pronounced for high sampling rates. (5.13)

Pareto rips compares with systematic rips(fro) in very much the same way as with the OSFSnps schemes. When the (y, s)-trend is fairly proportional, the two schemes have similar performances, with a slight edge for Pareto nps. When the (y, s)-trend is non-proportional, Pareto nps performs better than systematic nps(rfo). Differences are small, though, for small sampling rates, but may be pronounced for high sampling rates. (5.14)

When rips is worse than SRS, i.e. when the (y, D-trend is flat or decreasing. The bad performance of rips sampling is more accentuated for the OSFSnps schemes and systematic nps(rfo) than for Sunter rips, which performs better than the other. There is an edge for Pareto rips among 'non-Sunter' schemes. (5.15)

Page 22: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

180 B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159 191

Conclusions concerning systematic zcps with frame ordered by size (sfo). Systematic zps(sfo) has considerably better point estimation precision than the other ztps schemes when the (y, s)-trend is distinctly non-proportional. However, under fairly propor- tional (y, s)-trend, systematic ~rps(sfo) and Pareto zcps compare in an erratic way, they take turns to be better than the other. (5.16)

5.3.2. Comparisons of OSFSrcps and Sunter tops, based on asymptotic formulas In the following we present numerical AVI-values for exponential, Poisson and

Sunter ~ps and also for simple random sampling, SRS. SRS is not included in the study as a 'competitor', though, only to show the relative efficiency of ~ps-sampling versus SRS. For the OSFSzrps schemes the asymptotic formulas were used to compute (approximate) estimator variances, while the exact standard formulas were used for Sunter ~ps and SRS. We now turn to the numerical results which provide background for the claims in (5.11)~5.16).

We start by illustrating (5.11) and (5.12) by considering situations with a = 1 in (5.10), i.e. situations with proportional trend.

The AVI-values in Table 1 comply with (5.11) and (5.12). One may wonder, though, if the fairly small population size (N -- 100) affects the AVI-values in some particular direction. A 'magnified' situation with roughly the same degree of scatter (at least for large y-values) is obtained by increasing N by a factor 4 and a by a factor 2. The corresponding AVI-values are shown in Table 2, which again comply with (5.11) and (5.12). They also indicate that population size is not a crucial factor. Therefore, in the rest of this section we confine to N = 100.

Next we vary the degree of scatter. In Table 3 the scatter is increased (compared with Table 1) to a = 4, and in Table 4 it is decreased to a = 0.5. The figures in these tables again illustrate (5.11) and (5.12), in particular the latter part of (5.11).

Table 1

AVI for situation (5.10) with N = 100, a = 2, a = 1

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0,4 f = 0.5

Poisson zrps (%) 4 x 10 4 2 × 10 3 6 × 10 3 0.02 0.04 Exponential rcps (%) l x l 0 4 6 × 1 0 4 2 x 1 0 3 7 x 1 0 3 0.03

Sunter rcps (%) 2.6 58 181 280 401

SRS (%) 450 446 440 432 421

Table 2 AVI for situation (5.10) with N = 400, ~r = 4, a = 1

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Poisson rcps (%) 4 × 10 -4 2 × 10 -3 7 x 10 -3 0.02 0.05 Exponential Tzps (%) l x 1 0 -4 5 x 1 0 -4 1 × 1 0 -3 3 x 1 0 3 5 x 1 0 3

Sunter rips (%) 13 79 192 344 469 SRS (%) 423 430 438 450 468

Page 23: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Roskn /Journal of Statistical Planning and lnJerence 62 (1997) 159-191

Table 3 AVI for situation (5.10) with N = 100, a = 4, a - 1

181

Sampling fraction f f = 0.1 f = (1.2 f = 0.3 f = 0.4 f = 0.5

Poisson =ps (%) 4 x 10 4 2 )< 10 3 6 × 10 3 0.02 0.04 Exponential =ps (%) l x l 0 ~ 6x10 4 2×103 7×10 3 0.03 Sunter =ps (%) 9.8 8.1 47 60 87 SRS (%) 105 103 101 98 94

Table 4 AVI for situation (5.10) with N = 100, cr = 0.5, a = 1

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f - 0.5

Poisson rips (%) 4x10 -~ 2x10 -3 6x10 "~ 0.02 0.04 Exponential rtps (%) l x l 0 4 6x10-4 2x10 3 7x10 3 0.03 Sunter =ps (%) 309 1187 2933 4950 6820 SRS (%) 7412 7347 7266 7161 7019

Next we consider var ia t ion of the (y, s)-trend. First we look at growingly convex

s i tuat ions by considering a = 1.2, 1.5 and 2. It should be noted that a = 2 yields

a rather extreme si tuation, though. In the first round we use ~r = 2. The AVI-values

are shown in Tables 5-7. They il lustrate the claim in (5.13). Also (5.11) is illustrated.

Even if a negative Sunter AVI- value turns up at one place (for a = 2), we still mean

that (5.11) gives the overall picture.

In Tables 5-7, a is kept fixed when a is increased, which leads to relatively decreased

(y, s)-scatter for units with large variable values. One may wonder if the effects that can

be observed in Tables 5-7 are to be ascribed to change of t rend or to lowered relative

scatter. We believe that the t rend factor is most impor tant . To illustrate this, Table

8 presents AVI-values for the s i tuat ion in Table 6 with doubled o, i.e. with a = 4.

Next we consider concave situations, i.e. s i tuat ions with a < 1. In order that the

relative scatter should not increase, and also to avoid negative y-values, we decrease

when a is decreased. The findings in "Fable 9 illustrate (5.13) as well as (5.11).

In Table 9, a = 0.7. If a is further decreased, we enter the region where =ps sampling

no longer is advantageous over SRS. For a = 0.5, the (y, s)-trend is quite flat, for a = 0

it is entirely flat. Cor responding AVI-values, which illustrate (5.15), are shown in

Tables 10 and 11.

5.3.3. Findings f rom simulations

This section reports on results from s imulat ions with two aims: (i) to give some

insight into the goodness of the approx imat ions that are set to work in practical

appl icat ion of OSFS=ps schemes, and (ii) compar isons with systematic =ps.

We employed a s tandard s imula t ion approach. A sampl ing si tuat ion was generated

by the model (5.10), yielding a k n o w n z(y). Repeated independent =ps samples with

Page 24: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

182 B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159-191

Table 5 AVI for situation (5.10) with N = 100, a = 2, a = 1.2

Sampling fraction f f = 0.l f = 0.2 f = 0.3 f = 0.4 f = 0.5

Poisson rips (%) 0.06 0.3 1.0 2.8 7.8 Exponential zcps (%) 0.01 0.1 0.2 0.5 0.9 Sunter gps (%) 33 128 418 790 1321 SRS (%) 1213 1226 1249 1290 1387

Table 6 AVI for situation (5.10) with N = 100, a = 2, a = 1.5

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Poisson rips (%) 0.2 1.3 4.2 12 39 Exponential rcps (%) 0.1 0.3 0.9 2.3 4.7 Sunter zips (%) 5.8 62 262 653 1457 SRS (%) 1083 1095 1130 1222 1538

Table 7 AVI for situation (5.10) with N = 100, cr = 2, a = 2

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Poisson rcps (%) 0.3 1.5 5.1 15 54 Exponential rips (%) 0.1 0.4 1.1 2.8 6.2 Sunter rips (%) - 4.1 3.3 66 269 877 SRS (%) 536 550 582 660 941

Table 8 AVI for situation (5.10) with N = 100, ~r = 4, a = 1.5

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Poisson nps (%) 0.2 0.9 3.0 8.5 27 Exponential nps (%) 0.04 0.2 0.6 1.6 3.1 Sunter nps (%) 6.7 44 198 466 1018 SRS (%) 775 821 821 889 1084

prescribed sample sizes were drawn by Pareto, exponential, Poisson, Sunter and systematic gps. 3000 repetitions were used throughout. For each sample we computed the total estimate ?, by (3.7) for the OSFS schemes and by the HT-estimator for the other schemes, variance estimate 1~(-~), by (3.11) by for the OSFS schemes and by (1.3)

for Sunter gps. Moreover, approximate 95% confidence intervals ? _+ 1 . 9 6 x / ~ for T0') were computed for OSFS and Sunter ~ps, and it was checked if they covered TO') or not. The following summary statistics were derived: the empirical mean m(¢) and

Page 25: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B, Rosbn / Journal of Statistical Planning and Inference 62 (1997) 159 19 l

Table 9 AVI for situation (5.10) with N = 100, o = 0.5, a = 0.7

183

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Poisson ~zps (%) 0.1 0.7 1.9 4.7 12 Exponential ~zps (%) 0.03 0.2 0.5 1.7 2.0 Sunter ~zps (%) 25 43 103 141 180 SRS (%) 246 229 213 197 186

Table 10 AVI for situation (5.10) with N = 100, a = 0.5, a = 0.5

Sampling fraction j ' f = 0.1 f = 0.2 f = 0.3 f= 0.4 /= 0.5

Poisson 7zps (%) 0.1 0.6 1.8 4.4 l 1 Exponential ~zps (%) 0.03 0.2 0.4 1.0 1.8 Sunter rrps (%) - 5.3 - 30 - 35 - 46 -- 48 SRS (%) - 31 -- 35 40 - 44 - 47

Table 11 AVI for situation (5.10) with N = 100, o = 0.2, a - 0

Sampling fraction f f - 0.1 f = 0.2 f = 0.3 . f= 0.4 1 = 0.5

Poisson ~ps (%) 0.1 0.4 1.0 2.4 5.6 Exponential ~ps (%) 0.02 0.1 0.3 0.6 1.0 Sunter rtps (%) - 50 - 76 - 89 96 - 99 SRS (%) - 98 - 98 - 98 - 98 98

empirical variance, $2(~), for t h e e s t i m a t e s ~, t he e m p i r i c a l m e a n of t he v a r i a n c e

e s t i m a t e s , m(l?('~)) a n d empirical cover rates for c o n f i d e n c e i n t e rva l s .

5.3.3. l. On approximation goodness. T h e c r u c i a l q u e s t i o n s are: W h e n is t h e b ias in

t h e e s t i m a t o r (3.7) r e a s o n a b l y small '? W h e n is t h e b i a s in t he v a r i a n c e e s t i m a t o r (3.1 1)

r e a s o n a b l y sma l l ? W h e n is a n o r m a l a p p r o x i m a t i o n c o n f i d e n c e i n t e r v a l g o o d e n o u g h ?

In a d d i t i o n , we h a v e a q u e s t i o n of m o r e t h e o r e t i c a l in t e res t : H o w well d o a s y m p t o t i c

v a r i a n c e s by (3.9) a p p r o x i m a t e t he t r u e ones?

R e s u l t s for t he O S F S r c p s a n d S u n t e r s c h e m e s a re s h o w n in T a b l e s 12-14 , w h i c h

t r e a t s i t u a t i o n s w i t h q u i t e d i f f e r en t t r e n d fo rms . S ince r e l a t i ve d i s c r e p a n c i e s a re

s i m p l e r to g r a s p t h a n a b s o l u t e ones , we p r e s e n t t h e r e su l t s in t e r m s of relative bias for

the point estimator m(~) / z (x ) - 1, a n d relative bias for the variance estimator m[~'('~)]/S2(~) - 1. S u n t e r ' s s c h e m e is n o t i n c l u d e d in t he b i a s s tudy , s ince i ts p o i n t

a n d v a r i a n c e e s t i m a t o r a re u n b i a s e d .

F r o m T a b l e s 12 14, a n d a l so f r o m T a b l e s 15 a n d 16, is seen t h a t all t h r e e O S F S 7~ps

p r o c e d u r e s h a v e e x c e e d i n g l y s m a l l b i a s for t he p o i n t e s t i m a t o r . H e n c e , t h a t p a r t of the

a p p r o x i m a t i o n i s sue s e e m s to c a u s e n o p r o b l e m .

Page 26: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

184 B. Rosdn /Journal o f Statistical Planning and Inference 62 (1997) 159 191

Table 12 Performance of 7rps procedures in situation N = 100, a = 2, a = 1

Sampling Relative bias for Relative bias for rate point estimator (%) variance estimator (%)

Empirical coverage rates for confidence intervals (%)

PAR EXP POI PAR EXP POI PAR EXP POI SUNT

0.1 --0.1 --0.1 --0.1 --4.5 --4.1 --4.2 92.1 92.1 92.2 92.5 0.2 0 0 0 --3.7 --2.7 --2.4 93.2 93.4 93.6 92.6 0.3 0 0 0 --2.9 --2.9 --3.6 93.7 93.8 93.8 93.2 0.4 0 0 0 0.3 0.2 0.2 93.9 94.0 93.9 94.0 0.5 0 0 0 1.7 1.8 1.7 94.4 94.3 94.4 95.0

Table 13 Performance of rcps procedures in situation N = 100, a = 2, a = 1.5

Sampling Relative bias for Relative bias for rate point estimator (%) variance estimator (%)

Empirical coverage rates for confidence intervals (%)

PAR EXP POI PAR EXP POI PAR EXP POI SUNT

0.1 0.I 0.2 0.2 - 1 . 4 -1 .1 - 1 . 3 90.5 90.4 90.3 89.5 0.2 0.1 0.2 0.2 2.0 2.0 0.9 93.7 93.7 93.3 91.0 0.3 0.1 0.2 0.2 0.7 1.0 0.2 93.1 92.7 92.4 93.7 0.4 0 0.1 0.I - 1 . 7 - 1 . 7 - 0 . 6 93.0 92.8 93.1 94.0 0.5 0 0 0 - 4 . 4 - 3 . 5 1.8 93.0 92.9 93.3 95.0

Table 14 Performance of rips procedures in situation N = 100, cr = 0.5, a = 0.7

Sampling Relative bias for Relative bias for rate point estimator (%) variance estimator (%)

Empirical coverage rates for confidence intervals (%)

PAR EXP POI PAR EXP POI PAR EXP POI SUNT

0.1 0 0 0 - 7 . 9 - 7 , 7 - 7 . 2 85.6 85.3 85.4 85.5 0.2 -0 .1 --0.1 -0 .1 - 2 . 7 - 3 . 4 - 3 . 7 88.3 88.1 88.1 87.3 0.3 0.1 0 -0 .1 - 5 . 8 - 4 . 8 - 5 . 4 88.4 88.2 87.7 93.5 0.4 0 0 -0 .1 - 3 . 0 - 3 . 5 - 3 . 0 88.5 88.4 88.1 94.2 0.5 0 0 0 - 2 . 5 - 4 . 3 - 2 . 5 87.7 87.8 87.8 94.8

T h e p i c t u r e is n o t e q u a l l y pe r f ec t as r e g a r d s v a r i a n c e e s t i m a t o r s a n d c o n f i d e n c e

levels . F o r t h e l a t t e r t he a p p r o x i m a t i o n g o o d n e s s d e p e n d s j o i n t l y o n t he g o o d n e s s of

t he n o r m a l d i s t r i b u t i o n a p p r o x i m a t i o n a n d t h e a p p r o x i m a t i o n s b e h i n d t he v a r i a n c e

e s t i m a t o r s . S ince s a m p l e sizes a re fa i r ly smal l , a n d t he s a m p l i n g s i t u a t i o n s in T a b l e 13

a n d 14 a re q u i t e d i s p e r s e d ( in t he s ense o f R e m a r k 4.3) a n d a n u m b e r o f t a r g e t

i n c l u s i o n p r o b a b i l i t i e s lie fa i r ly c lose to 1 w h e n f > 0.3, t h e m a g n i t u d e s of t h e

a p p r o x i m a t i o n - e r r o r s in T a b l e s 13 a n d 14 a re n o t t o o s u r p r i s i n g . L a r g e r s a m p l e sizes

a r e r e q u i r e d for r ea l ly g o o d a p p r o x i m a t i o n .

Page 27: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159 I91

Table 15 Performance of rrps procedures in situation N = 200, ~r = 2, a = 1.5

185

Sampling Relative bias for Relative bias for rate point estimator (%) variance estimator (%)

Empirical coverage rates for confidence intervals (%)

PAR EXP POI PAR EXP POI PAR EXP POI SUNT

0.1 0. l 0.1 0.1 1.3 1.2 1.7 92.5 92.6 92.5 92.7 0.2 0.1 0.1 0.1 1.8 1.8 1.3 93.6 93.6 93.4 92.4 0.3 0 0 0 2.3 3.0 3.5 93.7 93.7 93.6 93.2 0.4 0 0 0 - 2.4 - 2.0 0.7 94.7 94.6 95.0 94.6 0.5 0 0 0 -{1.5 - 0.3 3.0 93.5 93.2 94.1 95.6

Table 16 Performance of 7zps procedures in situation N = 300, ~r = 2, a = 0.7

Sampling Relative bias for Relative bias for rate point estimator (%) variance estimator (%)

Empirical coverage rates for confidence intervals (%)

PAR EXP POI PAR EXP POI PAR EXP PO1 SUNT

0.1 0.1 0 0 - 1.6 - 1.8 - 1.8 89.7 89.6 89.6 88.9 0.2 0 0 0 4.7 3.9 3.4 91.2 92.0 91.7 92.7 0.3 0 0 0 - 1.9 - 2.8 - 3.1 91.5 91.4 91.1 94.2 0.4 0 0 0 - 0.9 - 0.4 - 0.5 92.3 92.3 92.2 94.4 0.5 0 0 0 - 1.9 -- 0.1 2.6 91.6 92.0 92.2 95.2

T a b l e s 15 a n d 16 t rea t ' e n l a r g e d ' ve r s ions (N is inc reased) of the s i tua t ions in T a b l e s

13 and 14. T h e y s h o w tha t the a p p r o x i m a t i o n s i m p r o v e w h e n s a m p l e sizes are

increased . A lesson f r o m T a b l e s 1 ~ 1 6 is, as in m o s t n o r m a l a p p r o x i m a t i o n contexts :

T h e r e is no s imple rule of t ype 'n > 20' wh ich g u a r a n t e e s g o o d n o r m a l a p p r o x i m a t i o n ~

a p r o p e r rule m u s t be m o r e c o m p l i c a t e d . T h e resul ts i nd ica t e tha t d i spe r s ion (in the

sense of R e m a r k 4.3) is a c ruc ia l factor . W h e n it is smal l (as in T a b l e 12), con f idence

in te rva l s h a v e decen t levels a l r e ady for s a m p l e sizes 10-20, whi le l a rger s ample sizes

are r e q u i r e d in m o r e d i spe r sed s i tua t ions .

A n o t h e r c o n c l u s i o n f r o m T a b l e s 12-16 runs as fol lows. E v e n if Sun te r ' s p r o c e d u r e ,

as r ega rds a p p r o x i m a t i o n s , e m p l o y s on ly the n o r m a l d i s t r i b u t i o n a p p r o x i m a t i o n ,

whi le the O S F S p r o c e d u r e s are ba sed on c o n s i d e r a b l y m o r e soph i s t i c a t ed a p p r o x i m a -

t ion a r g u m e n t s , O S F S c o n f i d e n c e in te rva l s seem to p e r f o r m as well as S u n t e r conf i -

dence in tervals . Th i s ho ld s at leas t as l o n g as S u n t e r 7zps is ' g e n u i n e ' 7rps. In the

c o n s i d e r e d s i tua t ions , S u n t e r rtps is ve ry c lose to S R S for f = 0.4 a n d . / ' = 0.5 (which

a lso is ref lec ted by the A V I - v a l u e s in T a b l e s 1, 6 a n d 9).

It is a lso of in te res t to see h o w well the a s y m p t o t i c v a r i a n c e s in (3.9) c o m p l y wi th the

e m p i r i c a l ones. T a b l e s 17-19 p re sen t relative biases in the asymptotic variances in (3.9),

de f ined as

[ v a r i a n c e by a s y m p t o t i c f o r m u l a (3 .9 ) ] / [ empi r i ca l v a r i a n c e ] - 1.

Page 28: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

186 B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159-191

Table 17 Relative bias in asymptotic variance (3.9) for N = 100, G = 2, a = 1

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Pareto 7rps (%) - 5.2 - 2.8 - 2.3 1.5 3.5 Exponential z~ps (%) - 4.5 - 1.6 - 2.1 1.7 3.8 Poisson rcps (%) - 4.5 - 1.2 - 2.5 1.4 3.0

Table 18 Relative bias in asymptotic variance (3.9) for N = 200, ~ = 2, a = 1.5

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Pareto zrps (%) 0.6 3.5 2.5 - 2.0 1.2 Exponential 7rps (%) 0.7 3.7 3.4 - 1.6 1.3 Poisson ~ps (%) 0.8 3.7 3.9 - 0.2 4.2

Table 19 Relative bias in asymptotic variance (3.9) for N = 300, ~ = 0.5, a = 0.7

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Pareto ~ps (%) - 0.2 4.7 - 3.5 - 0.6 0.1 Exponential 7rps (%) - 0.1 4.1 - 4.8 - 0.1 2.1 Poisson 7rps (%) - 0.3 4.2 - 4.6 - 0.6 4.4

O u r c o n c l u s i o n f r o m Tables" 17-19 is t ha t the a s y m p t o t i c v a r i a n c e f o r m u l a (3.9)

w o r k wi th su rp r i s ing ly g o o d a p p r o x i m a t i o n a lso for fair ly smal l n- and N-va lues .

H e n c e , the A V I - v a l u e s in T a b l e s 1 - i 1 in fact lie qu i t e c lose to the c o r r e s p o n d i n g

V I - v a l u e s in (5.1).

5.3.3.2. Comparisons with systematic zrps. C o m p a r i s o n s are m a d e b e t w e e n P a r e t o ,

S u n t e r a n d s y s t e m a t i c zrps, w i th r a n d o m f r ame o r d e r (rfo) as well as size o r d e r e d f r ame

(sfo) by empirical AVI-values = ($2(?) for ' a l t e rna t i ve ' s c h e m e ) / ( a s y m p t o t i c v a r i a n c e

for P a r e t o ~ps) - 1. Resu l t s a re p r e s e n t e d in T a b l e s 20-26 , wh ich a re o r d e r e d by

d e c r e a s i n g a-va lues . F o r sy s t ema t i c ~rps(rfo) the p i c tu re is fa i r ly clear , n a m e l y as

f o r m u l a t e d in (5.14) a n d (5.15).

F o r sy s t ema t i c lrps(sfo) the f indings, wh ich a re s u m m a r i z e d in (5.16), a re a bit

surpr i s ing . W h e n the (y , s ) - t rend dev ia te s p r o n u o n c e d l y f r o m p r o p o r t i o n a l

(a = 2, 1.5, 0.7, 0.5 a n d 0), sy s t ema t i c ~ps(sfo) has d r a m a t i c a l l y be t t e r e s t i m a t i o n

p rec i s ion t h a n the m e a s u r a b l e zrps schemes . H o w e v e r , w h e n the 0', s ) - t rend is fair ly

p r o p o r t i o n a l (a = 1.2, a n d 1), wh ich is the k i n d o f t r end o n e is a i m i n g for in p rac t ice ,

the p i c tu re b e c o m e s fuzzy, big pos i t i ve a n d n e g a t i v e A V I - v a l u e s mix in an i r r e g u l a r

way.

Page 29: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Rosdn /Journal of Statistical Planning and Inference 62 (1997) 159-191

Table 20 Empi r i ca l AVI for (5.10) wi th N = 100, er = 2.0, a = 2.0

187

Sampl ing f ract ion f f - 0.1 f = 0.2 f = 0.3 f = 0.4 J - 0.5

Sys temat ic ~zps, r a n d o m frame order (%) 1.9 0.6 1.3 26 35 Sys temat ic ~zps, f rame o rde r by size (%) - 86 - 89 - 93 - 94 9 l Sunter ~zps (%) - 4.1 3.3 66 269 877

Table 21

Empi r ica l AV1 for (5.10) wi th N = 100, a = 2, a = 1.5

Sampl ing fract ion f f - O. 1 f = 0.2 f = 0.3 f = 0.4 .1= 0.5

Sys temat ic 7tps, r a n d o m frame order (%) 2.2 1.0 0.7 21 29

Sys temat ic tops, f rame o rde r by size (%) - 74 - 75 73 - 86 - 73 Sunter gps (%) 5.8 62 262 653 1457

Tab le 22 Empi r i ca l AVI for (5.10) wi th N = 100, a = 2.0, a = 1.2

Sampl ing fract ion f f = 0.1 [= 0.2 f = 0.3 f = 0.4 f = 0.5

Sys temat ic ~ps, r a n d o m frame o rde r (%) 2.9 - 0.7 3.5 3.0 3.3

Sys temat ic r~ps, f rame order by size (%) - 18 - 27 + 46 - 41 ÷ 1.2 Sunter gps (%) 33 128 418 790 1321

Tab le 23 Empi r ica l AVI for (5.10) wi th N = 100, ~ = 2, a =: 1

Sampl ing fract ion f f = 0.1 f = 0.2 f = 0.3 f - 0.4 f - 0.5

Sys temat ic rips, r a n d o m frame order (%) 2.4 - 3.5 0.6 - 2.2 -- 3.3 Sys temat ic :p s , f rame order by size (%) q- 3.6 16 + 88 - 21 + 12

Sunter : p s (%) 2.6 58 181 280 401

Tab le 24

Empi r i ca l AVI for (5.10) wi th N = 100, a = 0.5, a = 0.7

Sampl ing fract ion f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Sys temat ic r~ps, r a n d o m frame o rde r (%) 1.9 - 2.2 0.3 8.7 13

Sys temat ic tops, f rame order by size (%) - 34 - 52 - 9.2 - 52 - 47 Sunter nps (%) 25 43 103 141 180

Page 30: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

188 B. Rosdn /Journal of Statistical Planning and inference 62 (1997) 159-191

Table 25 Empirical AVI for (5.10) with N = 100, ~r = 0.5, a = 0.5

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Systematic ~ps, random frame order (%) 3.4 - 0.4 - 0.6 7.3 13 Systematic ~ps, frame order by size (%) - 34 - 49 - 17 - 50 - 50 Sunter ~ps (%) - 5.3 - 30 - 35 - 46 - 48

Table 26 Empirical AVI for (5.10) with N = 100, ~ = 0.2, a = 0

Sampling fraction f f = 0.1 f = 0.2 f = 0.3 f = 0.4 f = 0.5

Systematic 7rps, random frame order (%) 13 11 - 6.8 - 0.4 14 Systematic 7rps, frame order by size (%) - 26 - 33 - 26 - 40 - 43 Sunter gps (%) - 50 - 76 - 89 - 96 - 99

It is difficult to d r a w c lear -cut p rac t ica l conc lus ions , pe rhaps they s h o u l d r u n as

follows. If one believes tha t the (y, s ) - t rend lies well a w a y f rom p r o p o r t i o n a l , sys tem-

at ic zcps(sfo) is to be prefer red to OSFSzcps a n d sys temat ic ~ps(rfo). However , even

m o r e preferab le is p r o b a b l y to do as follows, if possible . I f a is d i s t inc t ly grea te r t h a n 1,

t r a n s f o r m the size m e a s u r e to achieve a m o r e p r o p o r t i o n a l (y, s)- t rend. I f a is cons ide r -

ab ly less t h a n 1, avo id a n y k i n d of ~rps scheme since SRS is bet ter , or t r a n s f o r m the size

m e a s u r e to achieve a m o r e p r o p o r t i o n a l (y, s)- t rend. W h e n the (y, s ) - t rend is fair ly

p r o p o r t i o n a l , sys temat ic 7rps(sfo) m a y lead to s u b s t a n t i a l l y be t te r e s t i m a t i o n prec is ion

t h a n P a r e t o zrps, b u t the oppos i t e m a y also occur. It seems difficult to tell which

a l t e rna t i ve is at h a n d in a specific s i tua t ion .

5.4. General conclusions

W h e n cons i d e r i n g c o m p u t a t i o n aspects in the sequel, we pay regard to p r o g r a m -

m i n g b u t n o t to c o m p u t e r t ime, the cos t of which is negl ig ib le n o w a d a y s (at least for

s ample surveys). T h e weight to be ass igned to p r o g r a m m i n g w o r k depends , of course ,

on the n u m b e r of t imes the survey will be conveyed , the fewer the m o r e i m p o r t a n t to

keep p r o g r a m m i n g d o wn .

5.4.1. Comparison o f OSFSrcps and Sunter zrps W e c o m p a r e the OSFSzrps schemes a n d S u n t e r lrps, a n d we fol low po in t s (i)-(iii) in

(1.6).

Simplicity of sample selection: As s ta ted in R e m a r k 4.6, the desc r ip t ion of P a r e t o

~ps in Sec t ion 4.3 ho lds in its genera l s t ruc tu re for all O S F S ~ p s schemes. As is seen

f rom Steps 1-3 in tha t desc r ip t ion , p r o g r a m m i n g of sample se lect ion for an O S F S n p s

Page 31: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Rosdn /Journal of Statistical Planning and Inference 62 (1997) 159-191 189

scheme is very simple. Programming for Sunter's scheme is also simple, a bit less than for OSFSrrps, though, because of the need to modify the size measures.

Estimation precision: Sunter's point estimator is unbiased. Tables 12-16 indicate that the OSFSrcps schemes all have very small point estimator bias. Hence, point estimator bias can be disregarded, the crucial aspect is estimator variances.

As stated in (5.11), the OSFS schemes perform better, often considerably better, than Sunter rcps in situation where ~ps sampling is more efficient than SRS. Also (5.15) has relevance in the comparison. Tables 10 and 11 imply that Sunter ~zps performs better than OSFS~ps in situations where ~ps sampling is disadvantageous to SRS. The conclusion we draw from that is, however, not that OSFS~ps should be avoided in favor of Sunter rcps in such situations, but that any kind of ~ps sampling should be avoided (or, if possible, that the size measure should be transformed so as to obtain a more proportional (y, s)-trend).

Variance estimation properties: Pertinent aspects are listed in (a}-(c) below. (a) Sunter Tops yields unbiased variance estimates, which are non-negative for the

variance estimator (1.3). The OSFSrcps schemes have consistent, non-negative vari- ance estimates which are very mildly biased.

(b) Even if the OSFSrrps procedures are based on more elaborate approximation arguments than Sunter ~ps, their normal approximation confidence intervals seem to have as good level properties as for Sunter ~zps, at least when Sunter rcps is not close to SRS (thereby being fairly uninteresting).

(c) Variance estimation for the OSFS~ps schemes is very simple to program, it has "complexity o r d e r ' = computation of an ordinary sample variance (see (4.9) and Remark 4.4). The only data from the sample selection needed at the estimation stage are the normed size values. Variance estimation for Sunter rcps requires more involved programming. Second-order inclusion probabilities should be derived at the sample selection stage, by a somewhat intricate algorithm, and be saved to the estimation phase. The variance estimation formulas themselves are also a bit intricate. Summing up, we mean that the OSFS~ps schemes lead to fully acceptable variance estimates, which are considerably simpler to compute than those for Sunter ~ps.

Comparison relating to Remarks 4.8 and 4.9: There is no Sunter ~ps analogue to the substitution for overcoverage indicated in Remark 4.8, Due to the list-sequential ingredients in the sample selection for Sunter ~rps as well as for the OSFSnps, all these schemes admit sample coordination as indicated in Remark 4.9.

Combined conclusions: The OSFSrcps schemes are on no vital point inferior to Sunter rrps and in many respects better, notably as regards the most crucial aspect, estimation precision. Hence, OSFS~rps schemes should be preferred to Sunter tops.

5.4.2. Comparison of OSFS~zps schemes Here we consider Pareto, exponential and Poisson ~ps. As regards simplicity of

sample selection, simplicity of variance estimation and level properties for confidence intervals, they are regarded as essentially equivalent. Hence, the crucial aspect is estimation precision. In many situations the schemes perform very similarly, but there

Page 32: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

190 B. Roskn /Journal of Statistical Planning and Inference 62 (1997) 159-191

are situations where estimation precisions do differ substantially (see Tables 6-11). Then Pareto tops, being the optimal one, performs better than the other two, and it should therefore be preferred among OSFSrcps schemes. Use of Pareto ~ps can be seen as an 'insurance without premium', it never performs worse than other OSFSnps schemes, and in some situations it performs considerably better.

5.4.3. Comparisons of Pareto ztps and systematic rcps Comparisons with systematic rcps generally: Sample selection is simple for system-

atic nps, under random as well as size frame order, (rfo) resp. (sfo). The same holds for Pareto ~rps.

Substitution for overcoverage and sample coordination as indicated in Remarks 4.8 and 4.9 can be made for Pareto zcps but, to our knowledge, no analogues exist for systematic ~rps.

Comparison with systematic 7rps(rfo): For systematic ~ps(rfo), the Hartley-Rao variance estimator mostly works with good approximation, even if not entirely asymptotically correct. If the Hartley-Rao estimator is regarded to yield acceptable variance estimates, Pareto rips and systematic ~ps(rfo) are fairly equivalent with respect to (iii) in (1.6). Programming of variance estimation is a bit simpler, though, for Pareto zips.

With regard to (ii), the main conclusion is formulated in (5.14). It states that Pareto 7zps never performs (noteworthy) worse than systematic zrps(rfo), and sometimes considerably better. Hence, Pareto ~ps should be preferred.

Comparison with systematic rcps(sfo): Systematic ~ps(sfo) has its specific weakness with respect to (iii) in (1.6), it does not admit objective assessment of sampling errors. As regards (ii), point estimator precision, the picture is a bit confusing. In situations that deviate pronouncedly (maybe one should say extremely) from proportional (y, s)-trend, systematic ~ps(sfo) leads to dramatically better estimation precision than Pareto zrps, and also than the other rips schemes. It is somewhat unclear what to recommend, though; Use of systematic ztps(sfo) or transformation of the sizes. In situations with fairly proportional (y, s)-trend, the ranking between systematic ~ps(sfo) and Pareto 7rps seems to be erratic, the schemes take turns to be better than the other. We leave the problem there, by concluding that the choice between Pareto zps and systematic ~ps(sfo) becomes a matter of judgment/beliefs on behalf of the sampler.

5.4.4. Overall conclusion With 5.4.1-5.4.3 as background our tentative overall conclusions run as follows:

• Pareto r~ps should be preferred among rips schemes that admit objective assessment of estimation precision.

• Systematic ~ps with frame ordered by the sizes leads in many cases to distinctly better point estimation precision than Pareto nps, but the opposite also occurs. It is difficult to tell if a particular sampling situation is advantageous for systematic ~ps(sfo) or not. When ~ps(sfo) is used, control of variance estimation is lost, though.

Page 33: On sampling with probability proportional to size sampling with probability... · On sampling with probability proportional to ... and we let U refer to frame as well ... is disregarded

B. Ros~n /Journal of Statistical Planning and Inference 62 (1997) 159 191 191

Moreover , if over -coverage subs t i tu t ion a n d / o r sample c o o r d i n a t i o n (see Re ma rks

4.8 and 4.9) is of interest it speaks in favour of Pa re to rtps.

Choice be tween Pa re to nps and sys temat ic 7tps(sfo) becomes a ma t t e r of judg-

ment /bel iefs for the sampler .

References

Ohlsson, E. (1995). Sequential Poisson Sampling. Research Report from Institute of Actuarial Mathematics and Mathematical Statistics at Stockholm University.

Ros6n, B. (1997). Asymptotic Theory for order sampling. J. Star. Plan. Inf, this issue. S~irndal, C.-E., B. Swensson and J. Wretman (1992). Model Assisted Survey Sampling. Springer, New York. Sunter, A.B. (1977). List sequential sampling with equal or unequal probabilities without replacement. Appl.

Statist. 26, 261-268. Wolter, K.M. (1985). Introduction to Variance Estimation. Springer, New York.