Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.
-
Upload
felix-stokes -
Category
Documents
-
view
222 -
download
0
Transcript of Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and AnalysisMTH 494
Lecture-22
Ossam ChohanAssistant Professor
CIIT Abbottabad
2
Review
3
Regression Estimation
• We observed that the ratio estimator is most appropriate when the relationship between y and x is linear through the origin.
• If there is evidence of a linear relationship between the observed y’s and x’s, but not necessarily one that would pass through the origin, then this extra information provided by the auxiliary variable x may be taken into account through a regression estimator of the mean µy.
4
• One must still have knowledge of µx before the estimator can be employed, as it was in the case of ratio estimation of µy.
• The underlying line that shows the basic relationship between y’s and x’s is sometimes referred to as the regression line of y upon x.
• Thus the subscript L in the ensuing formulas is used to denote linear regression.
5
• The estimator given in next section assumes the x’s to be fixed in advance and the y’s to be random variable.
• We can think of the x values as something that has already been observed, like last year’s first quarter earnings, and the y response as a random variable yet to be observed, such as the current quarterly earnings of a company for which x is already known.
• The probabilistic properties of the estimator then depend only on y for a given set of x’s.
6
• If stratum sample sizes are very small, or if the within-stratum ratios are all approximately equal, then the combined ratio estimator may perform better.
• Of course, an estimator of the population total can be found by multiplying either of the estimators above by the population size N, and the variances can be adjusted accordingly.
• Thus we might use the notationyRSyRS N ˆˆ
7
Estimators
• Regression estimator of the population mean µy.
(3.28)
• Estimated Variance of
(3.29)
:ˆ yL
8
Estimator
• Bound of the error of estimation:
(3.30)
• When calculating b from observed pairs (y1,x1),…,(yn, xn), we may use the fact that
n
ii
n
iii
n
i
n
iii
xnx
yxnxy
xxi
xxyy
1
22
1
1
2
1
)(
9
Example 3.9
• A mathematical achievement test was given to 486 students prior to their entering a certain college. From these students a simple random sample of n=10 students was selected and their progress in calculus observed. Final calculus grades were then reported, as given in the accompanying table.
• It is known that µx=52 for all 486 students taking the achievement test.
• Estimate µy for this population, and place a bound on the error of estimation.
10
Data for problemStudent Achievement test score, x Final Calculus grade, y
1 39 65
2 43 78
3 21 52
4 64 82
5 57 92
6 47 89
7 28 73
8 75 98
9 34 56
10 52 75
11
Solution
12
Solution
13
• A close examination of the data on sugar content and weight of oranges given in example 3.2 might suggest that a regression estimator is more appropriate than ratio estimator.
• A plot of the points will show that the regression line does not appear to go through the origin.
• However, the regression estimator of a total is of the form , specifically requiring knowledge of N.
• Since the ratio estimator also works well in this case, determining the number of oranges in the truckload may not be worth the extra cost and time
yLN̂
14
• In other cases N may be known or easily found.
• Thus one should carefully consider the choice between ratio and regression estimators when estimating population means or totals.
15
Difference Estimation
• The difference method of estimating a population mean or total is similar to the regression method in that it adjusts the value up or down by an amount depending on the difference ( ).
• However, the regression coefficient b is not computed. In effect, b is set equal to unity.
• The difference method is, then, easier to employ than the regression method and frequently works just as well.
y)( xx
16
• It is commonly employed in auditing procedures, and we will consider such an example in this section.
• The following formulas hold provided that simple random sampling was employed.
17
Estimators
• Difference estimator of a population µy:
(3.31)
• Estimated variance of :
(3.32)
yD̂
18
Estimators
• Bound on the error of estimation
(3.33)
19
Example 3.10
• Auditors are often interested in comparing the audited value of item with the book value. Generally, book values are known for every item in the population, and audit values are obtained for a sample of these items. The book values can be used to obtain a good estimate of the total or average audit value for the population.
• Suppose a population contains 180 inventory items with a stated book value of $13,320. Let xi denote the book value and yi the audit value of the ith item. A simple random sample of n=10 items yields the results shown in the accompanying table. Estimate the mean audit value of µy by the difference method and estimate the variance of .yD̂
20
Data for ProblemSample Audit Value, yi Book Value, xi di
1 9 10 -1
2 14 12 2
3 7 8 -1
4 29 26 3
5 45 47 -2
6 109 112 -3
7 40 36 4
8 238 240 -2
9 60 59 1
10 170 167 3
21
Solution
22
Systematic Sampling
23
Session Objectives
• To introduce basic sampling concepts in systematic sampling
• Demonstrate how to select a random sample using systematic sampling design
• Estimation of different parameters in systematic random sampling
24
Sample Selection Procedure• List all the units in the population from 1,2,…,N –
Sampling frame• Select a random number g in the interval • 1 g K, using a random mechanism e.g. random
number tables, where K =
• K is called the Sampling Interval• N is the population size; n is the sample size • The random number g is called the random start and
constitutes the first unit of the sample
N
n
25
Sample Selection Procedure
• Take every kth unit after the random start• The selected units will be • g, g+k, g+2k, g+3k, g+4k, …,g+(n-1)k• Until we have n units• Example N =10000, n=100• k = =100
• Suppose g=87
10000
100
26
Sample Selection Procedure
• We select the following units• 87, 187, 287, 387,…, 9987
• NB: This procedure is however only valid if k is an integer (whole number)
• If k is not an integer (whole number) there are a number of methods we can use. We will consider just two of them
27
Sample Selection Procedure
• Method 1: Use Circular Sampling• Treat the list as circular so that the last unit is
followed by the first• Select a random start g between 1 and N,
using a random mechanism• Add the intervals k until n units are selected• Any convenient interval k will result into a
random sample
28
Sample Selection Procedure
• One suitable suggestion is to choose the integer k closest to the ratio
• Method 2: Use Fractional Intervals• Suppose we want to select a sample of 100 units
from a population of 21,156.• Calculate k = =211.56
• Select a random start g between 1 and 21156 using a random mechanism
N
n
21156
100
29
Sample Selection Procedure
• Suppose g = 582• Add the interval 21156 successively obtaining
exactly 100 numbers• The numbers will be 582, 21738, 42894, …• Divide each number by 100 and round to the
nearest whole number to get the selected sample, i.e.
• 6, 217, 429, etc
30
Advantages and Disadvantages of Systematic sampling
• Advantages:– The major advantage is that it is easy, almost
foolproof and flexible to implement– It is especially easy to give instructions to
fieldworkers– If we order our list prior to taking the sample,
the sample will reflect the ordering and as such can easily give a proportionate sample
31
Advantages and Disadvantages of Systematic sampling
• Disadvantages:– The main disadvantage is that if there is an
ordering (monotonic trend or periodicity) in the list which is unknown to the researcher, this may bias the resulting estimates
– There is a problem of estimating variance from systematic sampling- variance is biased