Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.

Sampling Design and AnalysisMTH 494

Lecture-22

Ossam ChohanAssistant Professor

CIIT Abbottabad

2

Review

3

Regression Estimation

• We observed that the ratio estimator is most appropriate when the relationship between y and x is linear through the origin.

• If there is evidence of a linear relationship between the observed y’s and x’s, but not necessarily one that would pass through the origin, then this extra information provided by the auxiliary variable x may be taken into account through a regression estimator of the mean µy.

4

• One must still have knowledge of µx before the estimator can be employed, as it was in the case of ratio estimation of µy.

• The underlying line that shows the basic relationship between y’s and x’s is sometimes referred to as the regression line of y upon x.

• Thus the subscript L in the ensuing formulas is used to denote linear regression.

5

• The estimator given in next section assumes the x’s to be fixed in advance and the y’s to be random variable.

• We can think of the x values as something that has already been observed, like last year’s first quarter earnings, and the y response as a random variable yet to be observed, such as the current quarterly earnings of a company for which x is already known.

• The probabilistic properties of the estimator then depend only on y for a given set of x’s.

6

• If stratum sample sizes are very small, or if the within-stratum ratios are all approximately equal, then the combined ratio estimator may perform better.

• Of course, an estimator of the population total can be found by multiplying either of the estimators above by the population size N, and the variances can be adjusted accordingly.

• Thus we might use the notationyRSyRS N ˆˆ

7

Estimators

• Regression estimator of the population mean µy.

(3.28)

• Estimated Variance of

(3.29)

:ˆ yL

8

Estimator

• Bound of the error of estimation:

(3.30)

• When calculating b from observed pairs (y1,x1),…,(yn, xn), we may use the fact that

n

ii

n

iii

n

i

n

iii

xnx

yxnxy

xxi

xxyy

1

22

1

1

2

1

)(

9

Example 3.9

• A mathematical achievement test was given to 486 students prior to their entering a certain college. From these students a simple random sample of n=10 students was selected and their progress in calculus observed. Final calculus grades were then reported, as given in the accompanying table.

• It is known that µx=52 for all 486 students taking the achievement test.

• Estimate µy for this population, and place a bound on the error of estimation.

10

Data for problemStudent Achievement test score, x Final Calculus grade, y

1 39 65

2 43 78

3 21 52

4 64 82

5 57 92

6 47 89

7 28 73

8 75 98

9 34 56

10 52 75

11

Solution

12

Solution

13

• A close examination of the data on sugar content and weight of oranges given in example 3.2 might suggest that a regression estimator is more appropriate than ratio estimator.

• A plot of the points will show that the regression line does not appear to go through the origin.

• However, the regression estimator of a total is of the form , specifically requiring knowledge of N.

• Since the ratio estimator also works well in this case, determining the number of oranges in the truckload may not be worth the extra cost and time

yLN̂

14

• In other cases N may be known or easily found.

• Thus one should carefully consider the choice between ratio and regression estimators when estimating population means or totals.

15

Difference Estimation

• The difference method of estimating a population mean or total is similar to the regression method in that it adjusts the value up or down by an amount depending on the difference ( ).

• However, the regression coefficient b is not computed. In effect, b is set equal to unity.

• The difference method is, then, easier to employ than the regression method and frequently works just as well.

y)( xx

16

• It is commonly employed in auditing procedures, and we will consider such an example in this section.

• The following formulas hold provided that simple random sampling was employed.

17

Estimators

• Difference estimator of a population µy:

(3.31)

• Estimated variance of :

(3.32)

yD̂

18

Estimators

• Bound on the error of estimation

(3.33)

19

Example 3.10

• Auditors are often interested in comparing the audited value of item with the book value. Generally, book values are known for every item in the population, and audit values are obtained for a sample of these items. The book values can be used to obtain a good estimate of the total or average audit value for the population.

• Suppose a population contains 180 inventory items with a stated book value of $13,320. Let xi denote the book value and yi the audit value of the ith item. A simple random sample of n=10 items yields the results shown in the accompanying table. Estimate the mean audit value of µy by the difference method and estimate the variance of .yD̂

20

Data for ProblemSample Audit Value, yi Book Value, xi di

1 9 10 -1

2 14 12 2

3 7 8 -1

4 29 26 3

5 45 47 -2

6 109 112 -3

7 40 36 4

8 238 240 -2

9 60 59 1

10 170 167 3

21

Solution

22

Systematic Sampling

23

Session Objectives

• To introduce basic sampling concepts in systematic sampling

• Demonstrate how to select a random sample using systematic sampling design

• Estimation of different parameters in systematic random sampling

24

Sample Selection Procedure• List all the units in the population from 1,2,…,N –

Sampling frame• Select a random number g in the interval • 1 g K, using a random mechanism e.g. random

number tables, where K =

• K is called the Sampling Interval• N is the population size; n is the sample size • The random number g is called the random start and

constitutes the first unit of the sample

N

n

25

Sample Selection Procedure

• Take every kth unit after the random start• The selected units will be • g, g+k, g+2k, g+3k, g+4k, …,g+(n-1)k• Until we have n units• Example N =10000, n=100• k = =100

• Suppose g=87

10000

100

26


• We select the following units• 87, 187, 287, 387,…, 9987

• NB: This procedure is however only valid if k is an integer (whole number)

• If k is not an integer (whole number) there are a number of methods we can use. We will consider just two of them

27


• Method 1: Use Circular Sampling• Treat the list as circular so that the last unit is

followed by the first• Select a random start g between 1 and N,

using a random mechanism• Add the intervals k until n units are selected• Any convenient interval k will result into a

random sample

28


• One suitable suggestion is to choose the integer k closest to the ratio

• Method 2: Use Fractional Intervals• Suppose we want to select a sample of 100 units

from a population of 21,156.• Calculate k = =211.56

• Select a random start g between 1 and 21156 using a random mechanism

N

n

21156

100

29


• Suppose g = 582• Add the interval 21156 successively obtaining

exactly 100 numbers• The numbers will be 582, 21738, 42894, …• Divide each number by 100 and round to the

nearest whole number to get the selected sample, i.e.

• 6, 217, 429, etc

30

Advantages and Disadvantages of Systematic sampling

• Advantages:– The major advantage is that it is easy, almost

foolproof and flexible to implement– It is especially easy to give instructions to

fieldworkers– If we order our list prior to taking the sample,

the sample will reflect the ordering and as such can easily give a proportionate sample

31

Advantages and Disadvantages of Systematic sampling

• Disadvantages:– The main disadvantage is that if there is an

ordering (monotonic trend or periodicity) in the list which is unknown to the researcher, this may bias the resulting estimates

– There is a problem of estimating variance from systematic sampling- variance is biased

Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.

Documents

Transcript of Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.