Chapter 14 Exercises in Sampling Theory Exercise 1 (Simple random...

SamplingTheory| Chapter 14 | Exercises in Sampling Theory | Shalabh, IIT Kanpur

Chapter 14 Exercises in Sampling Theory

Exercise 1 (Simple random sampling):

Let there be two correlated random variables and .X Y A sample of size n is drawn from a population by

simple random sampling without replacement. The observed paired sample is , , 1, 2,..., .i iX Y i n If the

sample totals are 1 1

ˆ ˆ, ,n n

tot i tot ii i

N NX X Y Y

n n

then find the covariance between ˆ andtotX totY .

Solution:

The covariance between andtot totX Y is

ˆ ˆ ˆ ˆ ˆ ˆ,

ˆ ˆ ˆ ˆ .

tot tot tot tot tot tot

tot tot tot tot

Cov X Y E X E X Y E Y

E X Y E X E Y

Note that

1

1

ˆ

ˆ

N

tot tot ii

N

tot tot ii

E X X X NX

E Y Y Y NY

where 1 1

1 1, ,

N N

i ii i

X X Y YN N

1

1 1

1

1.

( 1)

N

i i i ii

N N

i j i ii j j

E X Y X YN

E X Y X YN N

Also,

1 1 1 1 1

2

1 1 1

N N N N N

i j i i i ji j i i j j

N N N

i j i ii j j i

X Y X Y X Y

X Y N XY X Y

where 1 1

1 1, .

N N

i ii i

X X Y YN N

Then


22

1 1

22

1 1 1 1

22

1 1 1

22

ˆ ˆ,

( 1)

1

( 1)

1

n n

tot tot i ii i

N N N N

i j i ii j j i j j

N N N

i i i ii i j j

i i i

NCov X Y E X Y N XY

n

NE X Y X Y N XY

n

N n n nX Y X Y N XY

n N N N

N n n nX Y N XY X

n N N N

2

1 1

22 2

1

2

1

2

( 1) ( 1)1

( 1) 1

( ) 1

1

( )

1where .

1

N N

ii j

N

i ii

N

i ii

XY

XY i i

Y N XY

N n n n n nX Y N XY N XY

n N N N N N

N N nX X Y Y

Nn N

N N nS

Nn

S X X Y YN


Under the simple random sampling without replacement, find xyE s where

1

1 1

1

1

1 1, .

n

xy i ii

n n

i ii i

s X x Y yn

x X y Yn n

Solution:

Consider

1

1

21 1 1

1 1 1 1

1 1 1

1

1

1

1

1

1

1 1

1

1 1 1.

1

n

xy i ii

n

i ii

n n n

i i i ii i i

n n n n

i i i i i ji i i j j

n n n

i i i ji i j j

s X x Y yn

X Y nxyn

nX Y x y

n n

X Y X Y X Yn n

nX Y X Y

n n n


1 1

1 1 1 1

Since

( 1).

( 1)

n N

i i i ii i

n n n n

i j i ji j j i j

nE X Y X Y

N

n nE X Y X Y

N N

Thus

1 1 1

1 1 1

2

1 1

1

11 1

1 1

1 1

1

1 1

1

1 1

( 1) 1

1

1

N N N

xy i i i ji i j

N N N

i i i ji i j j

N N

i i i ii i

N

i ii

i

n nn NE s X Y X Y

n n N nN N

X Y X YN N N

X Y N XY X YN N N

NX Y XY

N N N N

XN

1

1

1

1

.

n

ii

N

i ii

XY

Y NXY

X X Y YN

S


Suppose in a population of size 1, and NN Y Y are two extreme values in the sense that 1Y denotes the

extremely low and NY denotes the extremely high values among the sample units 1 2, ,..., NY Y Y . Instead of

using the sample mean 1

1 n

ii

y yn

as an estimator of the population mean 1

1 N

ii

Y YN

in such a case,

the sample mean estimator is modified as follows

1

1

if sample contsins but not

if sample contsins but not

for all other samples

N

N

y k Y Y

y y k Y Y

y

where 0k is known constant. Determine if y is an unbiased estimator of population mean Y and find its

variance.

Solution:


Suppose the population units 1 2, ,..., NY Y Y are labeled as 11, 2,..., . So and NN Y Y now corresponds to labels 1

and N respectively.

Since the sample space corresponding to y has three possible situations, so it can be divided into three

disjoint subsets as follows:

1

2

All the units in thesample

All the units in the sample are such that unit label 1 is present and unit label is absent

All the units in the sample are such that unit labael is present and unit label 1 is absent

N

N

3 1 2.

When a sample of size n is drawn from a population of size ,N there are N

n

possible subsets. Similarly,

the number of possible subsets of sample drawn from the population

* 1

2are .

1

N

n

* 2

2are .

1

N

n

* 3

2 2are .

1 1

N N N

n n n

Under SRSWOR,

1 2 3

1 2 3

1

1

2 21

1 1

1 .

E y yN

n

y k y k yN

n

N Ny y y k k

N n n

n

y YN

n

So y is an unbiased estimator of the population mean.


The variance is

1 2 3

1 2 3

2 3

3

2

2 2

2 2 2

2

2 2

1

1

1

2 2 2

1

212 2

1

Var y y YN

n

y k Y y k Y y YN

n

y Y y Y y YN

n

NC C y Y y Y

n

Ny Y C C y Y y Y

N n

n

2

.

Observe that

*

2

1

1

N

n N nn

N N N

n

* 2

1

N

n

units in 1 containing the unit label 1, 3

2

N

n

of them have unit labels

2,3,..., 1j N and none of them contains the unit label .N

Thus

1 1 1

1

12

1

12

2 3 21

1 2 1

2 21 1.

1 12

N

jj

N

jj

y Y y Y

N N NY Y Y

n n nN

N NnY Y Y

n nn N

Similarly

2

1

2

2 21 1

1 12

N

N jj

N Nny Y Y Y Y

n nn N

.


Finally,

12

12

1

2

2

1

2 21 ( ) 1 1( ) 2 2

1 1( 1) 2

2 21 12

1 12

2 2

1

N

jj

N

N jj

yN

N Nn N n nV y Var y k k Y Y Y

N N nN N n N

n

N Nnk Y Y Y

N nn N

SN kY Y nk

N n N

.

Example 4 (Simple random sampling):

Let a sample of size 2 is drawn from a population of size 3 having units 1 2 3, andY Y Y . The following

estimator is used to estimate the population mean depending upon which sampling units are chosen

1 21 2

1 22 3

322 3

if sample is ,2 2

2if sample is ,

2 3

if sample is , .2 3

Y YY Y

Y Yy Y Y

YYY Y

Verify if y is an unbiased estimator of the population mean or not and find its variance.

Solution: The sample space 1 2 3, , .Y Y Y

3 31 2 1 2

1 2 3

21( )

3 2 2 2 3 2 3

2

1

3

Y YY Y Y YE y

Y Y Y

Y

which implies that y is an unbiased estimator of .Y Its variance is

2 2223 31 2 1 2

22 2 2 1 3 2 3 1 2 31 2

1 2 3

22 231 2 1

21( )

3 2 2 2 3 2 3

2

21 1 1 1 1 4 1 2

3 4 4 4 4 9 9 4 6 6 3

21

9 2 2 3

Y YY Y Y YVar y Y

YY Y Y Y Y YYYY Y Y

YY Y Y

22 3 .

2

YY Y



The population coefficient of variation of a variable Y is defined as YY

SC

Y where

22

1

1

1

N

Y ii

S Y YN

,

1

1 N

ii

Y YN

is based on population size N . Let the sample mean 1

1 n

ii

y yn

is

based on sample size n by SRSWR which is usually employed to estimate .Y Assuming YC to be known,

improve the sample mean in the sense that the estimator has a minimum mean squared error. Find out the

relative efficiency of this estimator relative to sample mean.

Solution: Consider a scalar multiple of y to estimate Y as

ky ky

where k is any scalar. The mean squared error of ky is

2

2

22 2

22 2 2

22 2 2

1

1 0

11

11 .

k

SRSWR

Yn

Y

M y E ky Y

E k y Y k Y

k Var y k Y

Nk S k Y

N

NY k C k

Nn

Now we use the principle of maxima/minima s follows to find the optimum value of k for which kM y is

minimum

2 2

2

2

1 2 2 1 .

Substituting 0

1 1 1

1 *, say.

11

k

Y

k

Y

Y

dM Y NY k C k

dk Nn

dM Y

dk

Nk C

Nn

k kN

CNn

The second order condition for maxima/minima is satisfied.


Thus the optimum estimator is

2

*1

1k

Yn

yy k y

NC

N

and its minimum mean squared error is

2 22 * 2 *

242

22 2

2 2

2 2

22

2

2 2

2

2

11

11

1 11 1

11

11

1

1

11

( ).

11

k Y

YY

Y Y

Y

Y

Y

Y

Y

SRSWR

Y

NM y Y k C k

Nn

NN CCNnNnY

N NC C

Nn Nn

NY C NNn C

NnNC

Nn

NY C

NnN

CNn

Var YN

CNn

12( ) 1

Relative efficiency 1 .( )

kY

M Y NC

Var y Nn

Exercise 6 (Sampling for proportions):

Suppose we want to estimate the proportion of men employees (P) in an organization having 1500 total

employees. In addition, suppose 3 out of 10 employees are men. Suppose a sample is drawn by simple

random sampling. Find the sample size to be selected so that the total length of confidence interval with

confidence level 0.05 is less than 0.02 for SRSWR and SRSWOR.

Solution:

First, consider that a sample of size n is drawn from a population of size N by SRSWR. The

100 1 % confidence interval for sample mean y or equivalently the sample proportion wrp is

given by


2 2

2 2

2 2

, ,

1 1or , .

1 1

y y

wr wr wr wrwr

s sy z y z

n n

p p p pp z p z

n n

The total length of confidence interval is

2

12 .

1wr wrp p

zn

It is given that

2

2

2

2

2

1 2 0.02

1

1or 0.0001

1

or 1 1 .

wr wr

wr wr

wr wr

p pz

n

p pz

n

n z p p

For 0.05, 2

2

31.96 and 0.3,

10wrz p we have

21.96 0.3 0.7

1 80690.0001

n

.

The sample size is greater than the given population size in this case but since SRSWR is used, so it is not

impossible.

Next, we consider the case when the sample is drawn by SRSWOR. The related 100(1 )% confidence

interval for y or equivalently the proportion worp is

2 2

2 2

2 2

, ,

or

(1 ) (1 ), .

1 1

y y

wor wor wor worwor wor

N n N ny z s y z s

Nn Nn

p p p pN n N np z p z

Nn n Nn n

The total length of the confidence interval is

2

(1 )2

1wor worp pN n

zNn n

.


It is given that

2

2

2

2

2

2

2

(1 ) 2 0.02

1

(1 )or 0.0001

1

0.0001 (1 )

or .(1 )

0.0001

wor wor

wor wor

wor wor

wor wor

p pN nz

Nn n

p pN nz

Nn n

z p p

np p

zn

2

2

2

3Since 0.3, 1.96, 1500, we have

10

0.0001 (1.96) 0.3(1 0.3)

0.3(1 0.3)0.0001 (1.96)

1500 1265.

worp z N

n

Exercise 7: (Varying Probability scheme):

Show that the Yates-Grundy estimator is non-negative under Midzuno sampling design.

Solution: A sufficient condition for the Yates-Grundy estimator to be non-negative is

0, , , 1, 2,..., .i j ij i j i j N

The expression of ,i j and ij are

1

1 1

1

1 1

1 1 2

1 2 1 2

ii

tot

jj

tot

i jij

tot

XN n n

N X N

XN n n

N X N

X XN n n n n

N N X N N

where 1

N

tot ii

X X

is the population total.

Now


2

2

1 1 21 1

1 1 1 2 1 2

1 1 1 1 1 2

1 1 1 2 1 1 2

1

j i jii j ij

tot tot tot

i j i j

tot tot

X X XN n n n nXN n n N n n

N n X N N n X N N N X N N

X X X XN n nN n n n n

N X N X N N N N N

N n

N

2

2 22

( ) ( 1 ( ) ( 11 .

1 2 1 2

i j i j

tot tot

X X X XN n n N n n

X XN N N N

Each of the terms on right hand side is nonnegative and so that Yates-Grundy estimator is always

nonnegative.

Exercise 8 (Stratified sampling):

Suppose that the population of size N can be expressed as N nk where n and k are integers. The

population is divided into n strata where the thh stratum contains units that are labeled as

1 , 1,2,..., , 1, 2,..., .hG h k j j k h n

Suppose only one unit is randomly selected from every stratum and a sample of size n is obtained. The

values of units in the population are modelled as , 1, 2,...,iY i i N where and are some

constants. Find the variance of population total.

Solution:

The estimate of population total in case of stratified sampling is

1

ˆ .L

hst h

h h

NY y

n

where 1

.L

h hjj

y y

Compared to the notations in stratified sampling, we have

Number of strata L n

thh stratum size hN k .

sample size from thh stratum 1.hn

Then


1

1

11

ˆ ˆ 11

.

n

st tot hh

n

hh

kY Y y j

k y

Since the variance of stY is

22

1

ˆ ,L

h h hst h

h h h

N N nVar Y S

N n

so we have

22

01 1

2

1 1

1 1ˆ.1 1

.

n k

hj hh j

n k

hj hh j

k kVar Y Y Y

k k

k Y Y

When population values are modeled by the relation , 1, 2,..., ,iY i i N then

1

1

22 2

1 1

2 222

2 2

1

1

11

( 1)1

2

1

2

11

4

1 2 1 1 1

6 4 2

1.

12

hj

k

h hjj

k

j

hj h

k k

hj hj j

Y h k j

Y Yk

h k jk

k kh k

kY Y j

kY Y j k j

k k k k k k k

k k

Thus

2 2 2 1ˆ .

12tot

nk kVar Y



Suppose there are two strata of sizes 1 2andN N such that 1 2 ,N N N (population size). Let

1 21 2and

N N

N N . Assume the population variances of first and second strata are 2 2

1 2andS S and they

are equal, i.e., 1 2.S S Consider a cost function 1 1 2 2C C n C n where 1 2andC C are the cost of collecting

an observation from first and second strata respectively. Assuming 1 2andN N to be large, show that

1 1 2 22

1 1 2 1

( )

( )prop st

opt st

Var y C C

Var y C C

where andprop st opt stVar y Var y are the variance of stratum mean under proportional and optimum

allocations in stratified sampling.

Solution

We have

2 2

1

22 2

1

Ki i

st i ii i i

Ki

i ii i

N nVar y S

N n

Sn

when iN is large where K is the number of strata. If cost function, in general, is

1

K

i ii

C C n

then under proportional allocation

1 1

1

1

or

or

or

or .

i i

i i

K K

i i i ii i

K

i ii

K

i ii

n N

n dN

C n d C N

C d C N

Cd

C N

Thus

1

ii K

i ii

CNn

C N


is the required sample size.

In the set up of given framework, we have 1 22, ,K S S S so

1 1 2 2

, 1, 2,.ii

CNn i

C N C N

Substituting in in the expression for the variance

2 2 2

1 1 2 22

1 2

2 221 2

21 2

1 1 2 2 1 1 2 2

21 1 2 2 1 22

21 1 2 2

1 2

21 1 2 2

1

1

(Using )

.

prop st

N S N SVar y

N n n

N NS

N CN CNC N C N C N C N

C N C N N NS

C N

C N C N SN N N

C N

C CS

C

The sample size under optimum allocation for the fixed cost is

1

, 1, 2,.

i i

iL K

i i ii

N S

Cn i

N S C

The variance of sty under optimum allocation when iN is large is


2 22

21

2 221 2

21 2

2 22 2

1 1 2 121 1

2

1 1 2 2

1 2

2 221 1 1 2 2 1 1 1 1 2 2 2

1 221 2

1

assuming

i iopt st

i i

i i i ii i

N SVar y

N n

N NS

N n n

N N S C N N S CS

N N S C N S C

C C

N N C N C C N N C N C CSS S

N N C N C

21 1 2 2 1 1 2 2

2

2 2

1 1 2 2 .

S

N C N C N C N CS

N C C

SC C

C

From the expression of and ,prop st opt stVar y Var y we find

1 1 2 22

1 1 2 1

( ).

( )prop st

opt st

Var y C C

Var y C C

Exercises 10 (Stratified sampling):

Suppose there are two strata and a sample of sizes 1 2andn n are drawn from these strata. Let a be the

actual ratio of 1 2andn n , i.e., 1

2

.a

n

n Let opt be the ratio of 1 2andn n under optimum allocation, i.e.,

1( )

2( )

optopt

opt

n

n . Let anda optV V be the variances of stratum mean under actual and optimum allocations,

respectively. Show that irrespective of the values 1 2 1 1 2, , , andN N S S S , the ratio opt

a

V

V is never less than

2

4

(1 )

when 1 2andN N are large and .a

opt

Solution:

When 1 2andN N are large, then


2 2 2 2

1 1 2 2

1 2st a

N S N SVar y V

n n

and variance under optimum allocation is

2

1 1 2 2

1.opt st optVar y V N S N S

n

Thus

2

1 1 2 2

2 2 2 21 1 2 2

1 2

2

2 2

1 12 22 2

2 21 2 1 1

1

11

.1

opt

a

N S N SV nN S N SV

n n

N Sn N S

N Sn n N S

Since under optimum allocation

1 11

1 1 2 2

2 22

1 1 2 2

1 1 1

2 2 2

1

2

1 2 2

2 1 1

.

opt

a

a

opt

N Sn n

N S N S

N Sn n

N S N S

n N S

n N S

n

n

n N S

n N S

Thus

2

2

12

2 22

1 1 2

2

1 2

21 2

11

1

1

.

opt

a

nV n n

nVn n n

n nn

n n


In this expression, replace

n by 1 2n n and

2 2

1 2 1 2 1 2by 4 ,n n n n n n

then we have

2

1 2 1 22 2

1 2 1 2

4.

1

opt

a

V n n n n

V n n n n

Using the result that

whenever 0,z

zz

over the expression opt

a

V

V, we notice that since 2

1 2 1 24 1n n n n is always true, so we have that

2

4.

(1 )opt

a

V

V


Suppose there are two strata and equal size of samples are drawn from both the strata i.e., 1 2n n . Another

option to draw the samples is optimum allocation. Let ande optV V denotes the variances under equal

allocation 1 2n n and optimum allocation respectively then assuming 1 2andN N are large, show that

2

1

2

1where ,

1opte opt

opt opt

nV V

V n

1 2andopt optn n are the sample sizes drawn from first and second strata.

Solution:

The variance of sty is

2

2

1

22

21

1

Ki i i

st ii i i

Ki

ii i

N N nVar y S

N N n

NS

N n

assuming iN to be large.


When 1 2 ,2

nn n say, then the variance of sty under equal allocation is

2 2 2 21 1 2 22

2e stV Var y N S N S

nN .

Under optimum allocation,

1 11

1 1 2 2

1 22

1 1 2 2

N Sn n

N S N S

N Sn n

N S N S

and

2 2 2 21 1 2 2

21 2

2 2 2 21 1 2 2

21 1 2 2

1 1 2 2 1 1 2 2

2

1 1 2 22

1

1

1.

opt stopt opt

N S N SV Var y

N n n

N S N S

N nN S nN S

N S N S N S N S

N S N S

N n

Since 1 1

2 2

, soN S

N S

2 2 22 22

22 22 22

2 222 2 2 2 2

2 2

2 222 2

2

2

21

21

2 11 1

11

1.

1

e

opt

e opt

opt

V N SnN

V N SnN

N SN SV V n n n

N SVn n


Exercise 12 (Ratio method of estimation)

Consider a generalized form of ratio estimator ˆR

yY X

x

for estimating the population mean Y where

is some scalar. Derive the approximate bias and mean squared error of ˆ .RY For what value of , the

mean squared error is minimum. Derive the minimum mean squared error. Note that for ˆ1, RY reduces

to usual ratio estimator.

Solution:

Define

0 0

1 1

0 1

2 2 2 20 1 0 1

2 22 2

2 2

1

1

0, 0

, ,

, , ,

X Y X Y

X YX Y

x Xx X

Xy Y

y YY

E E

f f fE C E C E C C

n n n

S SN nf C C

N X Y

is the population correlation coefficient between and .X Y

Write

1

0

1 0

21 0 0 0

21 0 0 1 0

21 0 0 1 0

ˆ

1

1

1 1

1 (1 ) 1 ... (assuming 1)

2

1 1 ....

2

1ˆ (upto second order of approximati2

R

R

yY X

xY

XX

Y

Y

Y

Y Y Y

on).

The bias ˆRY is given as


21 0 0 1 0

2

ˆ ˆ

1

2

10 0

2

1.

2

R R

X Y X

XX Y

Bias Y E Y Y

YE

f fY C C C

n n

YfCC C

n

The mean squared error of ˆRY is given as

2

2 2 21 0 0 1

2 2 2 2

ˆ ˆ

2 (upto second order of approximation)

2 .

R R

Y X X Y

MSE Y E Y Y

E

fY C C C C

n

.

To obtain the value of for which the ˆRMSE Y is minimum, we use the principle of maxima/minima as

follows:

2

min

ˆ

0 2 2 0

or , say.

R

X X Y

Y

X

dMSE YC C C

dC

C

The second-order condition for maxima/minima is satisfied.

The minimum value of mean squared error is obtained by substituting min in the expression for

ˆRMSE Y as

22 2 2 2

2

2 2

ˆ 2

1 .

Y YR Y X X Y

X X

Y

C CfMin MSE Y Y C C C C

n C C

fC

n


Exercise 13 (Varying probability scheme):

Suppose there are 5 units in a population with values

1 2 3 4 5

8 8 21, 1, , ,

3 3 3y y y y y .

The probabilities of selecting a sample of size 2 with different units as

1 2 3 4 3 5 4 5

1 1 1 1, , , , , , ,

2 6 6 6p y y p y y p y y p y y .

Calculate the first and second-order inclusion probabilities and find the probability distribution of the -

estimator of the total.

Suppose the sample total HTY is used which is based on Horvitz-Thompson estimator is used to estimate

the square root of population total totY . Find the probability distribution of HTY and find if it is

unbiased for totY or not. Calculate the variance of HTY .

Solution:

The inclusion probabilities are

1 2 3 4 5 12 34 35 45

1 1 1 1 1 1 1 1 1, , , , , , , , ,

2 2 3 3 3 2 6 6 6 0ij for all other pairs of

( , ).i j

The Horvitz Thompson estimator of population total is

1 1 14 with probability

1/ 2 1/ 2 2ˆ

8 / 3 8 / 3 1 1 1 116 with probability .

1/ 3 1/ 3 6 6 6 2

Y

Since the sample size is 2, the estimator of variance is

2

ˆ j i j iji

i j ij

yyVar Y

.

Thus for the

sample 1 2

1 ˆ( , ), , 0.2sy y p Var Y

sample 3 4

1 ˆ( , ), , 0.6sy y p Var Y


sample 3 5

1 ˆ( , ), , 0.6sy y p Var Y

sample 4 5

1 ˆ( , ), , 0.6sy y p Var Y

Next

12 with probability

21

4 with probability .2

Y

Thus

1 12 4 3 10 .

2 2 totE Y Y

So Y is a biased estimator of totY and it underestimates it.

The variance of Y is

2

ˆ ˆ

10 9 1.

Var Y E Y E Y

Chapter 14 Exercises in Sampling Theory Exercise 1 (Simple random...

Documents

Transcript of Chapter 14 Exercises in Sampling Theory Exercise 1 (Simple random...