Survey Sampling and Weights
-
Upload
jessica-angelina -
Category
Documents
-
view
223 -
download
0
Transcript of Survey Sampling and Weights
-
8/12/2019 Survey Sampling and Weights
1/58
Survey sampling
Sampling & non-sampling error
Bias
Simple sampling methods
Sampling terminology
Cluster sampling
Design effect Stratified sampling
Sampling weights
-
8/12/2019 Survey Sampling and Weights
2/58
Why sample?
To make an inference about a
population Studying entire pop is impractical or
impossible
-
8/12/2019 Survey Sampling and Weights
3/58
Example of sampling
Estimate the proportion of adults,
ages 18-65, in Port Elizabeth that
have type 2 diabetes
Select a sample from which to
estimate the proportion
Population: adults aged 18-65 living
in Port Elizabeth
Inference: proportion with type 2
diabetes
-
8/12/2019 Survey Sampling and Weights
4/58
Probability sampling
Each individual has known (non-
zero) probability of selection Precision of estimates can be
quantified
-
8/12/2019 Survey Sampling and Weights
5/58
Non-probability sampling
Cheaper, more convenient
Quality of estimates cannot beassessed
May not be representative of
population
-
8/12/2019 Survey Sampling and Weights
6/58
Sampling error
v.
Non-sampling error
-
8/12/2019 Survey Sampling and Weights
7/58
Sampling error
Random variability in sample
estimates that arises out of the
randomness of the sample selection
process
Precision can be quantified
(estimation of standard errors,
confidence intervals)
-
8/12/2019 Survey Sampling and Weights
8/58
Non-sampling error
Estimation error that arises from
sources other than random variation
non-response
undercoverage of survey
poorly-trained interviewers
non-truthful answers
non-probability sampling
This type of error is a bias
-
8/12/2019 Survey Sampling and Weights
9/58
What is bias?
We want to estimate the mean weight of allwomen aged 15-44 living in Coopersville.
Suppose there are 50,000 such women and
the true mean weight is 61.7 kg.
We select a sample of 200 such women and
interview them, asking each woman what
her weight is.
The sample mean weight is 59.4 kg.
Is our estimate biased?
-
8/12/2019 Survey Sampling and Weights
10/58
Bias
Suppose we could repeat the survey
many, many times.
Then we compute the mean of all thesample means.
Say the mean of the means = 62.9
Bias= (mean of means) - (true mean)
= 62.9 - 61.7 = 1.2 kg
-
8/12/2019 Survey Sampling and Weights
11/58
Unbiased estimation
If . . .
(mean of the means) = (true mean)then the bias is zero, and we say that
the estimator is unbiased.
The mean of the means is called theexpected value of the estimator.
-
8/12/2019 Survey Sampling and Weights
12/58
Simple sampling methods
Task: Select a sample of n
individuals or items from a
population of N individuals or
items
Common methods
simple random sampling
systematic sampling
-
8/12/2019 Survey Sampling and Weights
13/58
Simple sampling methods
Simple random sampling (SRS)
each item in population is equally likely
to be selected each combination of n items is equally
likely to be selected
Systematic sampling (typical method) randomly select a starting point
select every kth item thereafter
-
8/12/2019 Survey Sampling and Weights
14/58
Systematic sampling example
Stack of 213 hospital admission forms; select a
sample of 15
213/15 = 14.2 Select every 14th form Starting point: random number between 1 and 14
(we choose 11)
First form selected is 11th from top
Second form selected is 25th from top (11 + 14 = 25) Third form selected is 39th from top (11 + 2x14 = 39)
And so forth . . .
-
8/12/2019 Survey Sampling and Weights
15/58
Systematic sampling, continued
What is the probability that the 146th
form will be selected? The 195th?
Does this qualify as a simple random
sample? Why or why not?
Is there any potential problem arisingfrom the use of systematic sampling
in this situation?
-
8/12/2019 Survey Sampling and Weights
16/58
Example was typical
quick method
In the preceding example, we
selected every 14th form
Ideally, we would select every 14.2thform (see later example on 2-stage
sample of nurses)
Example is a quick and easy method,
commonly used in the field; it is a
good approximation to the more
rigorous procedure
-
8/12/2019 Survey Sampling and Weights
17/58
Systematic sampling: + and -
Advantages of systematic sampling
typically simpler to implement than SRS
can provide a more uniform coverage
Potential disadvantage of systematic
sampling
can produce a bias if there is asystematic pattern in the sequence of
items from which the sample is selected
-
8/12/2019 Survey Sampling and Weights
18/58
Role of simple sampling methods
These simple sampling methods are
necessary components of more
complex sampling methods: cluster sampling
stratified sampling
Well discuss these more complexmethods next (following some
definitions)
-
8/12/2019 Survey Sampling and Weights
19/58
Definitions
Listing units (or enumeration units) the lowest level sampled units (e.g.,
households or individuals)
PSUs (primary sampling units) the first units sampled (e.g., states or
regions)
Sampling probability for any unit eligible to be sampled, the
probability that the unit is selected in
the sample
-
8/12/2019 Survey Sampling and Weights
20/58
More definitions
EPSEM sampling
equal probability of selection method,
thus a method in which each listing unithas the same sampling probability
Sampling frame
the set of items from which sampling isdone--often a list of items.
-
8/12/2019 Survey Sampling and Weights
21/58
More definitions
Undercoverage: the degree to which
we fail to identify all eligible units in
the population
incomplete lists
incomplete or incorrect eligibility
information
-
8/12/2019 Survey Sampling and Weights
22/58
Still more definitions
Non-response: failure to interview
sampled listing units (study subjects)
refusal
death
physician refusal
inability to locate subject
unavailability
-
8/12/2019 Survey Sampling and Weights
23/58
Still more definitions
Precision: the amount of random
error in an estimate
often measured by the width or half-width of the confidence interval
standard error is another measure of
precision estimates with smaller standard error or
narrower CI are said to be more precise
-
8/12/2019 Survey Sampling and Weights
24/58
CLUSTER SAMPLING
single stage
-
8/12/2019 Survey Sampling and Weights
25/58
Clusters
Subsets of the listing units in the
population
Set of clusters must be mutuallyexclusive and collectively exhaustive
counties
townships regions
institutions
-
8/12/2019 Survey Sampling and Weights
26/58
Example
Single-stage cluster sampling
There are 361 nurses working at the
31 hospitals and clinics in Region 4
We wish to interview a sample of
these nurses
select a simple random sample of 5
hospitals/clinics
interview all nurses employed at the 5
selected institutions
-
8/12/2019 Survey Sampling and Weights
27/58
Assessing the example
Hospitals/clinics are the PSUs
Nurses are the listing units
Sampling probability for each nurse
is 5/31
Thus, this is an EPSEM sample
Sampling frame is the list of 31
hospitals and clinics
-
8/12/2019 Survey Sampling and Weights
28/58
CLUSTER SAMPLING
two stage
-
8/12/2019 Survey Sampling and Weights
29/58
Cluster sampling -- two stage
Select a sample of clusters, as in the
single-stage method
From each selected cluster, select a
subsample of listing units
-
8/12/2019 Survey Sampling and Weights
30/58
Cluster sampling -- two stage
It is always nice to do EPSEM
sampling because such samples are
self-weighting
dont need sampling weights in analysis
A common EPSEM method for two-
stage sampling is PPS (probability
proportional to size)
-
8/12/2019 Survey Sampling and Weights
31/58
PPS sampling
The key to the method is that the
sampling probabilities of clusters in
the first stage are proportional to thesizes of the clusters
size = number of listing units in cluster
At stage 2, select the same numberof listing units from each selected
cluster
-
8/12/2019 Survey Sampling and Weights
32/58
Nurse example revisited
Two-stage sampling
We want to interview a sample of 36
nurses
We can afford to visit 9 different
hospitals/clinics
Thus, we need to interview 36/9 = 4
nurses at each institution
-
8/12/2019 Survey Sampling and Weights
33/58
Nurse example revisited
Two-stage sampling
Stage 1: select a sample of 9
hospitals/clinics
Selection prob. proportional to size
Stage 2: select a sample of 4 nurses
from each selected institution
At each stage, use one of the simple
sampling methods
-
8/12/2019 Survey Sampling and Weights
34/58
Nurse example revisited
Two-stage sampling
PSUs are the hospitals/clinics
Listing units are the nurses Sampling frames
Stage 1: List of 31 hospitals/clinics
Stage 2: Lists of nurses at eachselected hospital/clinic
-
8/12/2019 Survey Sampling and Weights
35/58
Selecting 2-stage nurse sample
Sampling interval, I= 361/9 = 40.1
Starting point, random number between 1and 40; we choose R = 14
First sampling number = R = 14
2nd sampling number = 14 + 1x40.1 = 54.1
3rd sampling number = 14 + 2x40.1 = 94.2
We have selected institutions 2, 5, 9, . . .
-
8/12/2019 Survey Sampling and Weights
36/58
Two-stage nurse sample
InstitutionNumber No. ofNurses CumulativeNurses SamplingNumber
1 12 122 7 19 143 9 284 18 46
5 11 57 54.16 7 647 10 748 14 889 8 96 94.2
.
...
.
.31 9 361
Total 361
-
8/12/2019 Survey Sampling and Weights
37/58
Applying the sampling numbers
For each sampling number, choose
the first unit with cumulative size
equal to or greater than the sampling
number
Example: sampling number 54.1
first unit with cumulative size 54.1is unit 5 (cum. no. of nurses = 57)
so we select unit 5 for the sample
-
8/12/2019 Survey Sampling and Weights
38/58
Optional challenge
What is the selection probability for institution 1?
12/40.1 = 0.299
What is the selection probability for a nurse ininstitution 1?
(12/40.1) x (4/12) = 0.998 = 36/361
What is the selection probability for a nurse in
institution 2?
(7/40.1) x (4/7) = 0.998 = 36/361
All nurses have the same selection probability.
-
8/12/2019 Survey Sampling and Weights
39/58
Why do cluster sampling instead
Of a simple sampling method?
Advantages
reduced logistical costs (e.g., travel)
list of all 361 nurses may not be available
(reduces listing labor)
Disadvantages
estimates are less precise
analysis is more complicated (requires
special software)
-
8/12/2019 Survey Sampling and Weights
40/58
Design effect
Relative increase in variance of an
estimate due to the sampling design
variance = (standard error)2
Formula
s1 = standard error under simple
random sampling
s2 = standard error under complex
sampling design (e.g., cluster sampling)
design effect = (s2/s1)2
-
8/12/2019 Survey Sampling and Weights
41/58
Design effect for cluster sampling
For cluster sampling designs, the
design effect is always >1
This means that estimates from asurvey done with cluster sampling
are less precise than corresponding
estimates obtained from a surveyhaving the same sample size done
with simple random sampling
-
8/12/2019 Survey Sampling and Weights
42/58
Cluster sizes
Recommended take per cluster is20-40 for multi-purpose surveys
Time and resource limitations will
often dictate the maximum number ofclusters you can include in the study
Including more clusters improves the
precision of your estimates more
than a corresponding increase in
sample size within the clusters
already in the sample
-
8/12/2019 Survey Sampling and Weights
43/58
STRATIFIED
SAMPLING
-
8/12/2019 Survey Sampling and Weights
44/58
Strata
Subsets of the listing units in thepopulation
Set of strata must be mutually
exclusive and collectively exhaustive
Strata are often based on
demographic variables
age
sex
race
-
8/12/2019 Survey Sampling and Weights
45/58
Stratified sampling
Sample from each stratum
Often, sampling probabilities varyacross strata
-
8/12/2019 Survey Sampling and Weights
46/58
Stratified sampling
Advantages guarantees coverage across strata
can over-sample some strata in order to obtain
precise within-stratum estimates
typically, design effect < 1
Disadvantages
with unequal sampling probabilities, sampling
weights must be included in analysis more complicated
requires special software
-
8/12/2019 Survey Sampling and Weights
47/58
Example: sampling breast cancer
cases for the Womens CARE Study
Stratification variables
geographic site race (2 races)
five-year age group
Over-sampled younger women Over-sampled black women
-
8/12/2019 Survey Sampling and Weights
48/58
Example: Sampling households
for a reproductive health surveyin 11 refugee camps in Pakistan
Selected simple random sample ofhouseholds from within each of the
11 camps
All households were selected withthe same probability
-
8/12/2019 Survey Sampling and Weights
49/58
Refugee camp sampling
Camp PopulationSample
SizeCompletedInterviews
Lakhte Banda 12,943 64 61Kotki 1 7,262 36 29Kotki 2 5781 29 21
Kata Kanra 8,437 42 38Mohd Khoja 12,791 63 45Doaba 13,584 67 25Darsamand 17,797 88 53Kahi 11,061 55 32Naryab 5,543 28 19
Thal 1 11,087 55 44Thal 2 17,130 85 60Dallan 10,990 55 45
Total 134,406 667 472
-
8/12/2019 Survey Sampling and Weights
50/58
The sampling operation
Must be carefully controlled
dont leave to discretion in the field
use a carefully defined procedure
Document what you did
for reference during analysis
to defend your study
-
8/12/2019 Survey Sampling and Weights
51/58
Sampling frames
A list containing all listing units is
great if you can get it
ok if it includes some ineligibles
Problems associated with geographic
location-based sampling
map-based sampling
EPI sampling
-
8/12/2019 Survey Sampling and Weights
52/58
Sampling weights
Inverse of the net sampling
probability
Interpretation: the sampling weight
for an sampled individual is the
number of individuals his/her data
represent
-
8/12/2019 Survey Sampling and Weights
53/58
Example--sampling weights
There are 150 employees in a firm
stratum 1: 50 employees aged 18-29
stratum 2: 100 employees aged 30-69
We sample 10 from each stratum
Sampling probabilities are stratum 1: 10/50 = 0.20
stratum 2: 10/100 = 0.10
-
8/12/2019 Survey Sampling and Weights
54/58
-
8/12/2019 Survey Sampling and Weights
55/58
What about non-response?
1 employee in the stratum 1 sample
and 3 employees in the stratum 2
sample refuse to participate in the
survey
Net sampling probabilities
stratum 1: 9/50 = 0.18
stratum 2: 7/100 = 0.07
-
8/12/2019 Survey Sampling and Weights
56/58
Revised sampling weights
Sampling weights revised for non-
response
stratum 1: 1/0.18 = 5.56 stratum 2: 1/0.07 = 14.29
This computation is often done by
multiplying the original samplingweights by adjustment factors to
account for non-response rates
-
8/12/2019 Survey Sampling and Weights
57/58
Post-stratification weighting
Define strata, which may or may not have
been used as strata in the sampling design
Compute sampling probabilities = proportion
of each stratum that was actually sampled
Compute sampling weights from these
sampling probabilities
Allows post-hoc treatment of unequalrepresentation of population segments in
the sample
-
8/12/2019 Survey Sampling and Weights
58/58
Discussion topics
What is the population of interest?
Infinite populations
Selecting random numbers Selecting simple random samples
from finite populations
from infinite populations
Analysis software for complex
surveys