STA304H1F/1003HF Summer 2015: Lecture 10 - … Summer 2015: Lecture 10 ... Lecture 10 June 16, 2015...
Transcript of STA304H1F/1003HF Summer 2015: Lecture 10 - … Summer 2015: Lecture 10 ... Lecture 10 June 16, 2015...
STA304H1F/1003HF Summer 2015: Lecture 10
We should learn...
I When is systematic sampling not appropriate?
I When is systematic sampling nearly equivalent to SRS?
I Why is repeated systematic sampling used?
I What is one-stage cluster sampling?
I What is the trade-off in cluster sampling?
I What is two-stage cluster sampling?
Lecture 10 June 16, 2015 1
Cluster Sampling Ch. 8
I What is a cluster?
I What is cluster sampling?
I psu- primary sampling units
I ssu- secondary sampling units
I Types of cluster samples:
I one-stage cluster sampling
I two-stage cluster sampling
I Why?
Lecture 10 June 16, 2015 2
Cluster Sampling Ch. 8
I What is a cluster?
I What is cluster sampling?
I psu- primary sampling units
I ssu- secondary sampling units
I Types of cluster samples:
I one-stage cluster sampling
I two-stage cluster sampling
I Why?
Lecture 10 June 16, 2015 3
Cluster Sampling Ch. 8
I What is a cluster?
I What is cluster sampling?
I psu- primary sampling units
I ssu- secondary sampling units
I Types of cluster samples:
I one-stage cluster sampling
I two-stage cluster sampling
I Why?
Lecture 10 June 16, 2015 4
Cluster Sampling Ch. 8
I What is a cluster?
I What is cluster sampling?
I psu- primary sampling units
I ssu- secondary sampling units
I Types of cluster samples:
I one-stage cluster sampling
I two-stage cluster sampling
I Why?
Lecture 10 June 16, 2015 5
Cluster Sampling Ch. 8
I What is a cluster?
I What is cluster sampling?
I psu- primary sampling units
I ssu- secondary sampling units
I Types of cluster samples:
I one-stage cluster sampling
I two-stage cluster sampling
I Why?
Lecture 10 June 16, 2015 6
Cluster Sampling Ch. 8
I A natural contiguous grouping
I A probability sample (SRS for eg.) of clusters
I psu- clusters of population elements
I ssu- elements of interest
I Types of cluster samples:
I all ssu’s in a randomly selected psu are in the sample
I ssu’s are themselves randomly sampled
I easier to get a sampling frame of clusters, than of elements ofinterest
I cheaper/more convenient to sample contiguous units
Lecture 10 June 16, 2015 7
Notation (§8.3)
I mi = cluster size
I n =number of clusters in the sample
I N =number of clusters in the population
I M =∑N
i=1mi = population size
I Therefore, what are:
M =M
N=
∑Ni=1mi
N=
m =
∑ni=1mi
n=
Lecture 10 June 16, 2015 9
Notation (§8.3)
I mi = cluster size
I n =number of clusters in the sample
I N =number of clusters in the population
I M =∑N
i=1mi = population size
I Therefore, what are:
M =M
N=
∑Ni=1mi
N=average cluster size for all clusters
m =
∑ni=1mi
n=average cluster size for the sample of clusters
Lecture 10 June 16, 2015 10
Example: Lohr §5.2.1“A student wants to estimate the average GPA in his dormitory.Instead of obtaining a listing of all students in the dormitory andconducting an SRS, he notices that the dorm contains 100 suites,each with four students: he chooses 5 of those suites at randomand asks every person in the 5 suites what her or his GPA is. Theresults are as follows:”
Suite Person Number1 2 3 4 Total
1 3.08 2.60 3.44 3.04 12.16. . .
5 2.68 1.92 3.28 3.20 11.08
Thus,n = ,N =mi = ,M =
Lecture 10 June 16, 2015 11
Example: Lohr §5.2.1“A student wants to estimate the average GPA in his dormitory.Instead of obtaining a listing of all students in the dormitory andconducting an SRS, he notices that the dorm contains 100 suites,each with four students: he chooses 5 of those suites at randomand asks every person in the 5 suites what her or his GPA is. Theresults are as follows:”
Suite Person Number1 2 3 4 Total
1 3.08 2.60 3.44 3.04 12.16. . .
5 2.68 1.92 3.28 3.20 11.08
Thus,n = ,N =mi = ,M =
Lecture 10 June 16, 2015 12
Example: Lohr §5.2.1“A student wants to estimate the average GPA in his dormitory.Instead of obtaining a listing of all students in the dormitory andconducting an SRS, he notices that the dorm contains 100 suites,each with four students: he chooses 5 of those suites at randomand asks every person in the 5 suites what her or his GPA is. Theresults are as follows:”
Suite Person Number1 2 3 4 Total
1 3.08 2.60 3.44 3.04 12.16. . .
5 2.68 1.92 3.28 3.20 11.08
Thus,n = 5 ,N = 100mi = 4 ,M = 400
Lecture 10 June 16, 2015 13
Estimation (§8.3): Biased vs unbiased
Ratio estimator Unbiased estimator
µ: y =
∑ni=1 yi∑ni=1mi
τ : My = M
∑ni=1 yi∑ni=1mi
Nyt = N∑n
i=1 yin
p: p =
∑ni=1 ai∑ni=1mi
For ratio estimation, the auxiliary variable is:
Lecture 10 June 16, 2015 14
Estimation (§8.3, 8.6): Ratio vs unbiased
Ratio estimator Unbiased estimator
µ: y =
∑ni=1 yi∑ni=1mi
τ : My = M
∑ni=1 yi∑ni=1mi
Nyt = N∑n
i=1 yin
p: p =
∑ni=1 ai∑ni=1mi
I For ratio estimation, the auxiliary variable is: cluster size, mi
I we need M for ratio estimator of τ
I unbiased estimator does not use mi ; may be less precise thanMy
Lecture 10 June 16, 2015 15
Cluster sampling: proportions – §8.6 and Example8.9
I see Table 8.2: Cluster 1 m1 = 8, a1 = 4; Cluster 2m2 = 12, a2 = 7, etc.
I ai is the number of residents renting their homes
I ratio estimate of population proportion of renters:
p =
∑ni=1 ai∑ni=1mi
=72
151= 0.48
I Variance estimate??? – p is just a ratio estimator:
V (p) = (1− n
N)
s2p
nM2, s2p =
∑ni=1(ai − pmi )
2
n − 1
I why not use pi (1− pi )? – too small, because of clustersampling
I omit §8.7 and §8.8Lecture 10 June 16, 2015 16
Cluster sampling: proportions – §8.6 and Example8.9
I see Table 8.2: Cluster 1 m1 = 8, a1 = 4; Cluster 2m2 = 12, a2 = 7, etc.
I ai is the number of residents renting their homes
I ratio estimate of population proportion of renters:
p =
∑ni=1 ai∑ni=1mi
=72
151= 0.48
I Variance estimate??? – p is just a ratio estimator:
V (p) = (1− n
N)
s2p
nM2, s2p =
∑ni=1(ai − pmi )
2
n − 1
I why not use pi (1− pi )? – too small, because of clustersampling
I omit §8.7 and §8.8Lecture 10 June 16, 2015 17
Cluster sampling: proportions – §8.6 and Example8.9
I see Table 8.2: Cluster 1 m1 = 8, a1 = 4; Cluster 2m2 = 12, a2 = 7, etc.
I ai is the number of residents renting their homes
I ratio estimate of population proportion of renters:
p =
∑ni=1 ai∑ni=1mi
=72
151= 0.48
I Variance estimate??? – p is just a ratio estimator:
V (p) = (1− n
N)
s2p
nM2, s2p =
∑ni=1(ai − pmi )
2
n − 1
I why not use pi (1− pi )? – too small, because of clustersampling
I omit §8.7 and §8.8Lecture 10 June 16, 2015 18
Cluster sampling: proportions – §8.6 and Example8.9
I see Table 8.2: Cluster 1 m1 = 8, a1 = 4; Cluster 2m2 = 12, a2 = 7, etc.
I ai is the number of residents renting their homes
I ratio estimate of population proportion of renters:
p =
∑ni=1 ai∑ni=1mi
=72
151= 0.48
I Variance estimate??? – p is just a ratio estimator:
V (p) = (1− n
N)
s2p
nM2, s2p =
∑ni=1(ai − pmi )
2
n − 1
I why not use pi (1− pi )? – too small, because of clustersampling
I omit §8.7 and §8.8Lecture 10 June 16, 2015 19
Cluster sampling: proportions – §8.6 and Example8.9
I see Table 8.2: Cluster 1 m1 = 8, a1 = 4; Cluster 2m2 = 12, a2 = 7, etc.
I ai is the number of residents renting their homes
I ratio estimate of population proportion of renters:
p =
∑ni=1 ai∑ni=1mi
=72
151= 0.48
I Variance estimate??? – p is just a ratio estimator:
V (p) = (1− n
N)
s2p
nM2, s2p =
∑ni=1(ai − pmi )
2
n − 1
I why not use pi (1− pi )? – too small, because of clustersampling
I omit §8.7 and §8.8Lecture 10 June 16, 2015 20
Cluster sampling: proportions – §8.6 and Example8.9
I see Table 8.2: Cluster 1 m1 = 8, a1 = 4; Cluster 2m2 = 12, a2 = 7, etc.
I ai is the number of residents renting their homes
I ratio estimate of population proportion of renters:
p =
∑ni=1 ai∑ni=1mi
=72
151= 0.48
I Variance estimate??? – p is just a ratio estimator:
V (p) = (1− n
N)
s2p
nM2, s2p =
∑ni=1(ai − pmi )
2
n − 1
I why not use pi (1− pi )? – too small, because of clustersampling
I omit §8.7 and §8.8Lecture 10 June 16, 2015 21
Cluster sampling: how many clusters? §8.5
I as always, bound on ±2√
V used to determine sample size
I for ratio estimator, this depends on
σ2r , M,N, and n
I when the first three are known and/or can be guessed and/orare available from a preliminary study,
I solve for n in2√V = B
I there is a trade-off between n and mi
Lecture 10 June 16, 2015 22
... cluster sizes equal
I so how do we assess the trade-off?I special case: all clusters of the same size, m and
I there are n clusters in the sample,I M = Nm elements in the population, andI the total sample size is nm
cluster elements (ssu’s)1 2 . . . m
1 y11 y12 . . . y1m y1.cluster 2 y12 y22 . . . y2m y2.
(psu)...
......
...n yn1 yn2 . . . ynm yn.
Lecture 10 June 16, 2015 23
... cluster sizes equal
I so how do we assess the trade-off?I special case: all clusters of the same size, m and
I there are n clusters in the sample,I M = Nm elements in the population, andI the total sample size is nm
cluster elements (ssu’s)1 2 . . . m
1 y11 y12 . . . y1m y1.cluster 2 y12 y22 . . . y2m y2.
(psu)...
......
...n yn1 yn2 . . . ynm yn.
Lecture 10 June 16, 2015 24
... cluster sizes equal
I
cluster elements (ssu’s)1 2 . . . m
1 y11 y12 . . . y1m y1.cluster 2 y12 y22 . . . y2m y2.
(psu)...
......
...n yn1 yn2 . . . ynm yn.
I Equivalence: estimates of population total My = NytI
y = y.. =n∑
i=1
yi ./n =∑i ,j
yij/mn, V (y) =(
1− n
N
)( 1
m2
)s2rn
where
s2r =1
n − 1
n∑i=1
(myi . −my..)2 =
m2
n − 1
N∑i=1
(yi . − y..)2
Lecture 10 June 16, 2015 25
... cluster sizes equal
I
cluster elements (ssu’s)1 2 . . . m
1 y11 y12 . . . y1m y1.cluster 2 y12 y22 . . . y2m y2.
(psu)...
......
...n yn1 yn2 . . . ynm yn.
I Equivalence: estimates of population total My = NytI
y = y.. =n∑
i=1
yi ./n =∑i ,j
yij/mn, V (y) =(
1− n
N
)( 1
m2
)s2rn
where
s2r =1
n − 1
n∑i=1
(myi . −my..)2 =
m2
n − 1
N∑i=1
(yi . − y..)2
Lecture 10 June 16, 2015 26
... cluster sizes equal
I
cluster elements (ssu’s)1 2 . . . m
1 y11 y12 . . . y1m y1.cluster 2 y12 y22 . . . y2m y2.
(psu)...
......
...n yn1 yn2 . . . ynm yn.
I Equivalence: estimates of population total My = NytI
y = y.. =n∑
i=1
yi ./n =∑i ,j
yij/mn, V (y) =(
1− n
N
)( 1
m2
)s2rn
where
s2r =1
n − 1
n∑i=1
(myi . −my..)2 =
m2
n − 1
N∑i=1
(yi . − y..)2
Lecture 10 June 16, 2015 27
... cluster sizes equal
(p. 261 §8.4)
n∑i=1
m∑j=1
(yij − y..)2 =
n∑i=1
m∑j=1
(yij − yi .)2 +
n∑i=1
m∑j=1
(yi . − y..)2
=n∑
i=1
m∑j=1
(yij − yi .)2 + m
n∑i=1
(yi . − y..)2
SST = SSW + SSB
SST = n(m − 1)MSW + (n − 1)MSB
Lecture 10 June 16, 2015 29
... Example (Lohr, §5.2.1)
Suite Person Number(Cluster) 1 2 3 4 Total
1 3.08 2.60 3.44 3.04 12.162 2.36 3.04 3.28 2.68 11.363 2.00 2.56 2.52 1.88 8.964 3.00 2.88 3.44 3.64 12.965 2.68 1.92 3.28 3.20 11.08
11.304The variation between clusters and within clusters can be described
in an ANALYSIS OF VARIANCE table.
Source degrees of Sum of Meanfreedom Squares Square
Between Suites 4 2.2557 0.56392Within Suites 15 2.7756 0.18504
Total 19 5.0313 0.26480
Lecture 10 June 16, 2015 30
... Example (Lohr, §5.2.1)
Suite Person Number(Cluster) 1 2 3 4 Total
1 3.08 2.60 3.44 3.04 12.162 2.36 3.04 3.28 2.68 11.363 2.00 2.56 2.52 1.88 8.964 3.00 2.88 3.44 3.64 12.965 2.68 1.92 3.28 3.20 11.08
11.304The variation between clusters and within clusters can be described
in an ANALYSIS OF VARIANCE table.
Source degrees of Sum of Meanfreedom Squares Square
Between Suites 4 2.2557 0.56392Within Suites 15 2.7756 0.18504
Total 19 5.0313 0.26480
Lecture 10 June 16, 2015 31
... Example, using R
> GPA = scan()
1: 308 260 344 304 236 304 328 268 200 256
11: 252 188 300 288 344 364 268 192 328 320
21:
Read 20 items
> GPA = GPA/100
> suite = factor(rep(1:5,each=4))
> suite
[1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
> anova(aov(GPA ~ suite))
Analysis of Variance Table
Response: GPA
Df Sum Sq Mean Sq F value Pr(>F)
suite 4 2.2557 0.56392 3.0476 0.05039 .
Residuals 15 2.7756 0.18504
---
Lecture 10 June 16, 2015 32
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 33
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 34
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 35
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 36
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 37
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 38
... cluster sizes equal
I text compares variance estimate to that from SRS
I for which we need an estimate of population variance, σ2
I sadly,SSTotal/(nm − 1)
is not a good estimate (because of clustering)
I instead, we use
s2 =N(m − 1)MSW + (N − 1)MSB
Nm − 1' 1
m{(m−1)MSW+MSB}
I for GPA example s2 = 0.279, and efficiency of clustersampling ' 0.5
I see Example 8.5 and p. 263 of new edition for a differentexample, where cluster sampling happens to be more efficientthan SRS (unusual)
I HW: Exercise 8.2, 3, 4, 5, 6, 16, 20, 21, 22Lecture 10 June 16, 2015 39
One-stage cluster sampling summary
I There are two types of estimation methods in clustersampling: unbiased and ratio
I We aim to have high within − cluster variability and on theother hand, low between − cluster variability
I There is a trade off between number of clusters (n) and sizeof clusters (mi )
I When cluster sizes are equal, unbiased and ratio estimatorsare the same. The results can be displayed in an analysis ofvariance table.
Lecture 10 June 16, 2015 40
Two-stage cluster sampling Ch. 9
I select a sample of clusters, called primary sampling units – psu
I usually by SRS
I select a random sample of units within each cluster, calledsecondary sampling units – ssu
I also often by SRS
I but could use stratified random sampling at any step
I or even more complicated probability sampling methods
I as with one-stage cluster sampling, easier/cheaper/safer, dueto geographic proximity of elements within a cluster (often)
I can be used without a list of all population elements, justneed a list of population clusters, and then a list of ssu’s foreach sampled cluster
I example: sample universities (psu); sample students at chosenuniversities (ssu)
Lecture 10 June 16, 2015 42
Two-stage cluster sampling Ch. 9
I select a sample of clusters, called primary sampling units – psu
I usually by SRS
I select a random sample of units within each cluster, calledsecondary sampling units – ssu
I also often by SRS
I but could use stratified random sampling at any step
I or even more complicated probability sampling methods
I as with one-stage cluster sampling, easier/cheaper/safer, dueto geographic proximity of elements within a cluster (often)
I can be used without a list of all population elements, justneed a list of population clusters, and then a list of ssu’s foreach sampled cluster
I example: sample universities (psu); sample students at chosenuniversities (ssu)
Lecture 10 June 16, 2015 43
Two-stage cluster sampling Ch. 9
I select a sample of clusters, called primary sampling units – psu
I usually by SRS
I select a random sample of units within each cluster, calledsecondary sampling units – ssu
I also often by SRS
I but could use stratified random sampling at any step
I or even more complicated probability sampling methods
I as with one-stage cluster sampling, easier/cheaper/safer, dueto geographic proximity of elements within a cluster (often)
I can be used without a list of all population elements, justneed a list of population clusters, and then a list of ssu’s foreach sampled cluster
I example: sample universities (psu); sample students at chosenuniversities (ssu)
Lecture 10 June 16, 2015 44
Two-stage cluster sampling Ch. 9
I select a sample of clusters, called primary sampling units – psu
I usually by SRS
I select a random sample of units within each cluster, calledsecondary sampling units – ssu
I also often by SRS
I but could use stratified random sampling at any step
I or even more complicated probability sampling methods
I as with one-stage cluster sampling, easier/cheaper/safer, dueto geographic proximity of elements within a cluster (often)
I can be used without a list of all population elements, justneed a list of population clusters, and then a list of ssu’s foreach sampled cluster
I example: sample universities (psu); sample students at chosenuniversities (ssu)
Lecture 10 June 16, 2015 45
... two-stage cluster sampling
I as with one-stage cluster sampling, hope that clustersrepresent the population well
I so are quite variable within the cluster, but similar betweenclusters
I although the opposite usually happens
I sample data has the structure:
y11, y12, . . . , y1m1 cluster 1
y21, y22, . . . , y2m2 cluster 2...
yn1, yn2, . . . , ynmn cluster n
Lecture 10 June 16, 2015 46
... two-stage cluster sampling
I as with one-stage cluster sampling, hope that clustersrepresent the population well
I so are quite variable within the cluster, but similar betweenclusters
I although the opposite usually happens
I sample data has the structure:
y11, y12, . . . , y1m1 cluster 1
y21, y22, . . . , y2m2 cluster 2...
yn1, yn2, . . . , ynmn cluster n
Lecture 10 June 16, 2015 47
Estimation in two-stage cluster sampling §9.4
I no longer know the cluster totals yi . =∑mi
j=1 yijI so now we estimate the cluster totals first Mi yi .I and then use these estimates to estimate the population total
µ = (1/M)n∑
i=1
Mi yi ./n
I this assumes we know the size of the cluster, Mi as well as thesample size mi
I and the average cluster size in the whole populationM =
∑Ni=1Mi/N
I we can also use ratio estimation, as in Ch. 8
Lecture 10 June 16, 2015 48
Estimation in two-stage cluster sampling §9.4
I no longer know the cluster totals yi . =∑mi
j=1 yijI so now we estimate the cluster totals first Mi yi .I and then use these estimates to estimate the population total
µ = (1/M)n∑
i=1
Mi yi ./n
I this assumes we know the size of the cluster, Mi as well as thesample size mi
I and the average cluster size in the whole populationM =
∑Ni=1Mi/N
I we can also use ratio estimation, as in Ch. 8
Lecture 10 June 16, 2015 49
Estimation in two-stage cluster sampling §9.4
I no longer know the cluster totals yi . =∑mi
j=1 yijI so now we estimate the cluster totals first Mi yi .I and then use these estimates to estimate the population total
µ = (1/M)n∑
i=1
Mi yi ./n
I this assumes we know the size of the cluster, Mi as well as thesample size mi
I and the average cluster size in the whole populationM =
∑Ni=1Mi/N
I we can also use ratio estimation, as in Ch. 8
Lecture 10 June 16, 2015 50
... UNBIASED estimation, two-stage clustersampling
I
µ =1
M
∑ni=1Mi yi .
nI
V (µ) =
(1
M2
){(1− n
N
) s2bn
+1
nN
n∑i=1
M2i
(1− mi
Mi
)s2imi
}
where
s2b =
∑ni=1(Mi yi − Mµ)2
n − 1
and
s2i =
∑mij=1(yij − yi .)
2
mi − 1
Lecture 10 June 16, 2015 51
... UNBIASED estimation, two-stage clustersampling
I
µ =1
M
∑ni=1Mi yi .
nI
V (µ) =
(1
M2
){(1− n
N
) s2bn
+1
nN
n∑i=1
M2i
(1− mi
Mi
)s2imi
}
where
s2b =
∑ni=1(Mi yi − Mµ)2
n − 1
and
s2i =
∑mij=1(yij − yi .)
2
mi − 1
Lecture 10 June 16, 2015 52
... RATIO estimation, two-stage cluster sampling
I
µr =
∑ni=1Mi yi .∑ni=1Mi
I
V (µr ) =
(1
M2
){(1− n
N
) s2rn
+1
nN
n∑i=1
M2i
(1− mi
Mi
)s2imi
}
where
s2r =
∑ni=1(Mi yi − Mµr )2
n − 1
and
s2i =
∑mij=1(yij − yi .)
2
mi − 1
I as in Ch. 8, if Mi are all equal, then ratio and unbiasedestimate are the same
Lecture 10 June 16, 2015 53
... RATIO estimation, two-stage cluster sampling
I
µr =
∑ni=1Mi yi .∑ni=1Mi
I
V (µr ) =
(1
M2
){(1− n
N
) s2rn
+1
nN
n∑i=1
M2i
(1− mi
Mi
)s2imi
}
where
s2r =
∑ni=1(Mi yi − Mµr )2
n − 1
and
s2i =
∑mij=1(yij − yi .)
2
mi − 1
I as in Ch. 8, if Mi are all equal, then ratio and unbiasedestimate are the same
Lecture 10 June 16, 2015 54
... RATIO estimation, two-stage cluster sampling
I
µr =
∑ni=1Mi yi .∑ni=1Mi
I
V (µr ) =
(1
M2
){(1− n
N
) s2rn
+1
nN
n∑i=1
M2i
(1− mi
Mi
)s2imi
}
where
s2r =
∑ni=1(Mi yi − Mµr )2
n − 1
and
s2i =
∑mij=1(yij − yi .)
2
mi − 1
I as in Ch. 8, if Mi are all equal, then ratio and unbiasedestimate are the same
Lecture 10 June 16, 2015 55
Example
Exercise 9.2, 9.3: “A nurseryman wants to estimate the averageheight of seedlings in a large field...”
Number of Heights ofNumber of seedlings seedlings
Plot seedlings sampled (in inches)∑mi
j=1 yij1 52 5 12, 11 ,12 ,10 ,13 582 56 6 10, 9, 7, 9, 8, 10 533 60 6 6, 5, 7, 5, 6, 4 334 46 5 7, 8, 7, 7, 6 355 49 5 10, 11, 13, 12, 12 586 51 5 14, 15, 13, 12, 13 677 50 5 6, 7, 6, 8, 7 348 61 6 9, 10, 8, 9, 9, 10 559 60 6 7, 10, 8, 9, 9, 10 53
10 45 6 12, 11, 12, 13, 12, 12 72
Lecture 10 June 16, 2015 56
... example
I N = 50 plots are the primary sampling units; n = 10 aresampled
I Mi seedlings in each plot; mi ≈ 10%
I yij height of jth sampled seedling in the ith plot
I∑N
i=1Mi is unknown, so we use ratio estimation
I
µr =
∑ni=1Mi yi .∑ni=1Mi
=4970.833
530= 9.38
I Exercise 9.3: Assume∑N
i=1Mi is known to be 2600
I
µ =1
M
∑ni=1Mi yi .
n=
1
52
4970.833
10= 9.56
I HW: Exercise 9.6
Lecture 10 June 16, 2015 57
... example
I N = 50 plots are the primary sampling units; n = 10 aresampled
I Mi seedlings in each plot; mi ≈ 10%
I yij height of jth sampled seedling in the ith plot
I∑N
i=1Mi is unknown, so we use ratio estimation
I
µr =
∑ni=1Mi yi .∑ni=1Mi
=4970.833
530= 9.38
I Exercise 9.3: Assume∑N
i=1Mi is known to be 2600
I
µ =1
M
∑ni=1Mi yi .
n=
1
52
4970.833
10= 9.56
I HW: Exercise 9.6
Lecture 10 June 16, 2015 58
... example
I N = 50 plots are the primary sampling units; n = 10 aresampled
I Mi seedlings in each plot; mi ≈ 10%
I yij height of jth sampled seedling in the ith plot
I∑N
i=1Mi is unknown, so we use ratio estimation
I
µr =
∑ni=1Mi yi .∑ni=1Mi
=4970.833
530= 9.38
I Exercise 9.3: Assume∑N
i=1Mi is known to be 2600
I
µ =1
M
∑ni=1Mi yi .
n=
1
52
4970.833
10= 9.56
I HW: Exercise 9.6
Lecture 10 June 16, 2015 59