Slide 1 Georgia Kayser, PhD - WaSH MEL Center · Approaches to Sampling Georgia Kayser, PhD ......
Transcript of Slide 1 Georgia Kayser, PhD - WaSH MEL Center · Approaches to Sampling Georgia Kayser, PhD ......
Slide 1
Module 4Approaches to Sampling
Georgia Kayser, PhD
©2014 The Water Institute
Hello and Welcome to Monitoring Evaluation and Learning: Approaches to Sampling
Slide 2
Objectives
• To understand the reasons for sampling populations
• To understand the basic questions and issues in selecting a sample. These include sample size and logistic cost, bias, representativeness, and external validity
• To understand the advantages and disadvantages of different sampling methods
The purpose of this module is to understand the different types of sampling methods, and the advantages and disadvantages of each. We will also discuss external validity and sources of bias.
Slide 3
Sampling
Our understanding of the sampling method is critical to our understanding of the generalizability of the sample to the larger population.
Slide 4
Purpose
Most often, it is not practical or cost effective to study the whole population of interest
A sample– Is a smaller collection of units from the population
– Is used to determine truths about the population
– Allows you to generalize to the larger population
– Reduces costs
– Saves time for data collection
– Gives accurate results that can be calculated mathematically
(Field, 2005)
Sampling is the process of selecting units from a population of interest.
Slide 5
Sample
A sample is a smaller collection of units from the larger population
Sample
Population
Slide 6
Definitions
• Population: the group to which you want to generalize
• Sample: collection of observations
• Probability sampling: random collection of observations
• Nonprobability sampling: nonrandom collection of
observations
• Sample Frame: The list of units (subjects) of the
population from which the sample is selected
Slide 7
Definitions
• External Validity: The degree to which you can generalize to the population of interest
• Sampling Error: The precision of statistical estimates
Slide 8
Steps in Sampling
1. State the research or evaluation question(s) and variables of interest
2. Determine research design and methodology
3. Identify the population of interest – to whom do you want to generalize results
4. Determine the sample frame
These are the steps one will take in selecting the sampling method and conducting the sample. The next few slides will review the different types of sampling methods.
Slide 9
Steps in Sampling (con’t)
5. Decide on sampling method
6. Determine the sample size (discussed in next
module)
7. Implement the sample plan
8. Sample the population
9. Review the response rate and the sampling
process
These are the steps one will take in selecting the sampling method and conducting the sample. The next few slides will review the different types of sampling methods.
Slide 10
External Validity
The degree to which your sample is generalizable to the other populations, settings, and contexts
Threats to External Validity • Hawthorne effect: subjects know they are
participating in an experiment and respond to expectations or perceived expectations
• Order Effects: When multiple treatments are studied one after the other, a benefit might accrue from the first treatments
External validity is the degree to which a sample is generalizable to other places, people and times. Threats to external validity explain how the results may not be generalizable.
Slide 11
External Validity
Improving External Validity
• Replicate study in many different places, with different people and at different times
• Include a control group or multiple groups
To improve external validity, one can replicate the study in other places, with other populations and in other times. If the results are similar, this improves the generalizability of the results.
Slide 12
Types of Sampling
• Probability Sampling: a sample in which each element
has an equal chance of selection independent of any other
• Nonprobability Sampling: a sample where each
element does not have an equal chance of selection and some elements have no chance of being selected
(Babbie, 2013)
The two types of sampling are probability and nonprobability sampling. Probability sample utilizes random selection in the population of interest and is generalizable to the population of interest because of this random sample. Nonprobability sampling does not utilize a random sample of the population, is not generalizable to the population, but often is less costly than a random sample.
Slide 13
Types of Sampling
Probability Sampling
• Simple random
• Systematic
• Stratified
• Clustered – Multistage
– Probability Proportion to Size Sampling
Nonprobability Sampling
• Convenience
• Purposive – Snowball
– Quota
– Expert Sampling
There are a variety of types of sampling that are characterized as probability sampling and other types of sampling that are characterized as nonprobability sampling.
Slide 14
Nonprobability Sampling
Convenience Sampling
– Nonrandom selection
– Easiest method of collecting a sample
– Participants selected in the most convenient possible way
The first type of nonprobability sampling is convenience sampling whereby the sample is selected because it is convenient – close and easy to study. The selection is nonrandom and it is, therefore, not possible to generalize to the larger population.
Slide 15
Convenience Sampling
Advantages
• Convenient
Disadvantages
• Cannot generalize to the population
• Is only representative of the units (subjects) selected
• The degree that the sample is similar or different from the population is unknown – Low external validity
• Sample bias is introduced
• Need to describe limitations of the sample
There are advantages and disadvantages to convenience sampling.
Slide 16
Example of Convenience Sampling
Ex. 1 A questionnaire is being piloted in the population where it will be used and the closest group is selected
Ex. 2 A water-point survey where only those who visit between 9 and 9:30 AM are interviewed.
Slide 17
Convenience Sampling
Source: NewsAsiaOne
Slide 18
Nonprobability Sampling
Purposive Sampling
– Nonrandom selection
– Units selected based on researchers judgment
– Units selected because researcher thinks they will be most useful
Another type of nonprobability sampling is purposive sampling whereby the sample is selected based on a particular purpose and the researchers judgment. There are three types of purposive sampling that we will discuss here: snowball sampling, quota sampling and expert sampling.
Slide 19
Purposive Sampling
Advantages
• Judgment based sample
Disadvantages
• Cannot generalize to the population
• Is only representative of the units (subjects) selected
• The degree that the sample is similar or different from the population is unknown
There are advantages and disadvantages to purposive, non-probability sampling.
Slide 20
Example of Purposive Sampling
Someone is standing at a waterpoint talking to people that pass by that fit into a particular category (women 20-30 yrs old.)
– Identify people passing by, and ask then to participate
Purposive sampling may be useful if a quick sample is needed and generalizability is not a primary concern. You will be likely to underweight certain subgroups in the population.
Slide 21
Nonprobability Sampling
Snowball Sampling
– Nonrandom sample
– Used in hard to reach groups
– Units (subjects) selected are asked to nominate other units (subjects)
– Sample increases in size like a rolling snowball
Snowball sampling is a type of nonprobability sampling whereby target person(s) are identified and then those people recommend others for inclusion in the study. This type of sampling is used primarily in hard to reach groups – homeless people, undocumented workers, gang members. In particular, if social differences mean that it’s harder to contact water system users of a certain social group, snowballing may help.
Slide 22
Snowball Sampling
Recommends
Recommends
Recommends
Possible participants recommended by sampled participants
Sample
Source: http://faculty.elgin.edu/dkernler/statistics/
Slide 23
Snowball Sampling
Advantages
• Can investigate hard to reach groups
Disadvantages
• Cannot generalize to the population
• Is only representative of the units (subjects) selected
• The degree that the sample is similar or different from the population is unknown
There are advantages and disadvantages to snowball sampling.
Slide 24
Example of Snowball Sampling
Ex. 1 Want to study the water, sanitation and hygiene of the homeless in a specific geographic area and no lists of the homeless exist
– Go to the geographic area of interest
– Find one or two homeless people and interview them
– Ask them for recommendations of other homeless people at the end of the interview
– Interview the homeless people who are recommended to you
Here is an example of snowball sampling
Slide 25
Nonprobability Sampling
Quota
– Nonrandom
– Population divided into specified number and type of units (subjects)
– A predetermined number of units (subjects) are selected from each group
Quota sampling is another example of nonprobability sampling whereby subjects are selected nonrandomly according to a fixed proportion or quota.
Slide 26
Quota Sampling
Advantages
• Can divide the population into groups and select a certain number from each group
Disadvantages
• Cannot generalize to the population
• Is only representative of the units (subjects) selected
• The degree that the sample is similar or different from the population is unknown
There are advantages and disadvantages to quota sampling. The main disadvantage is that you cannot generalize to the population because the sample was not randomly selected.
Slide 27
Example of Quota Sampling
Collect proportional information from 40% female and 60% male high school students about their WaSH services because the population of students is 40% female and 60% male.
For example, if you knew the population of interest was 40% women and 60% men and you wanted a total sample size of 100, you could sample 40 women and 60 men. The sample is selected nonrandomly.
Slide 28
Nonprobability Sampling
Expert Sampling
– Nonrandom sample
– Used to reach experts
Snowball sampling is a type of nonprobability sampling whereby target person(s) are identified and then those people recommend others for inclusion in the study. This type of sampling is used primarily in hard to reach groups – homeless people, undocumented workers, gang members.
Slide 29
Expert Sampling
Advantages
• Can investigate a group of highly knowledgeable experts about a subject area
• May provide validity for a subsequent sampling approach
Disadvantages
• Cannot generalize to the population
• Is only representative of the units (subjects) selected
• The degree that the sample is similar or different from the population is unknown
• Experts could be wrong
There are advantages and disadvantages to snowball sampling.
Slide 30
Ex. Expert Sampling
• Want to understand the main challenges in the WaSH sector in country X, a country that you will be working in the coming year
– Interview experts in the WaSH sector in country X
• Note that experts come in all forms….e.g. handpump mechanics are experts in their business!
Slide 31
Probability Sampling
Simple Random Sample
– Random
– Each unit (subject) has an equal probability of selection
– Each unit is numbered and a predetermined number of units are sampled, randomly
• Here is a link to a video that explains how to create a random sample in excel.
A simple random sample uses random selection so that each unit has an equal chance of selection. Today, computers are often used to generate the random selection. There is often a listing of the subjects or a list needs to be created before the random sample can be selected.
Slide 32
Simple Random Sampling
Source: http://faculty.elgin.edu/dkernler/statistics/
The above is “random” if the numbers 2 ,5, 8, and 10 were four numbers selected using a random number generation scheme, or a random number table was used. There must be an explicit approach to generating random numbers, and not just picking numbers out of your head…these tend to be non-random!
Slide 33
Simple Random Sampling
Advantages
• Can generalize to the population
• Representative of the population
• The degree that the sample is similar or different from the population is known by calculating sample error
Disadvantages
• More expensive and time intensive than a nonprobability sample
• May not be practical if sample frame is large
There are advantages and disadvantages to probability random sampling. The main advantage is that one can generalize to the population. The main disadvantages are the expense and time necessary to complete the sample.
Slide 34
Probability Sampling
Systematic Random Sample
– Random
– Individuals are selected at regular intervals from a list of the whole population
– The intervals are selected to ensure an adequate sample size
• It is important that the start point is not automatically the first in the list
A systematic random sample, picks a regular interval by which to select the sample. It requires that the subjects or units in the population are randomly ordered with respect to the characteristics being measured.
Slide 35
Systemic Random Sampling
Start
Source: http://faculty.elgin.edu/dkernler/statistics/
Here, the interval is “every third person, after a random start”. Again, the population must be fully mixed (randomly ordered) if this is going to work.
Slide 36
Systematic Random Sampling
Advantages
• Can generalize to the population
• Representative of the population
• Easy to select and evenly spread over population
• The degree that the sample is similar or different from the population is known
Disadvantages
• Expensive and time consuming if the population of interest is large
• Is only as random as the mix of the population sampled…
There are advantages and disadvantages to a systematic random sample.
Slide 37
Steps in a Systematic Random Sample
• Number the units of the population 1 to N
• Calculate the sample size you need
• Decide on the interval size = k
• Take a random start
• Sample every kth unit
Here is a list of steps in a systematic random sample.
Slide 38
Probability Sampling
Stratified Sampling– Population divided into subgroups (strata) based on a
characteristic(s)
– Sample is obtained by taking samples from each stratum
– Probability of inclusion varies according to known characteristics
– Is taken into account in analysis
– Examples of strata include: male/female, smoking/nonsmoking, rural/urban, program area/non-program area
Another form of probability sampling is stratified sampling whereby the population is divided into groups or strata and a random sample is taken from each strata. When certain subgroups within a population vary considerably, the population can be divided into strata and the random sample can be selected according to the strata.
Slide 39
Stratified Sampling
Source: http://faculty.elgin.edu/dkernler/statistics/
Slide 40
Stratified Sampling
Advantages• Can generalize to the population• Can improve the
representativeness of the sample• The degree that the sample is
similar or different from the population is known
• Can analyze data according to strata
• Different sampling approaches can be applied to each stratum
• Less expensive than a simple random sample
• Less time intensive than a simple random sample
Disadvantages
• If there are a lot of strata, there may be relationships between the strata
– if these are not considered in the design, it may bias the results
Slide 41
Example of a Stratified Sample
You are an organization that mainly works with rural and urban communities and want to understand the WaSH services of the population you serve compared to a comparison group that you do not serve.
– Population can be divided into rural and urban (1st strata)
– Rural and urban population can be divided into the group you serve and the group you do not serve (2nd
strata)
– You can then sample the strata
Slide 42
Probability Sampling
Cluster Sampling
– A multistage sampling in which groups (clusters) are sampled in the first stage with each selected group (cluster) sampled in the second stage
– Population divided into clusters based on geography
• Stage 1: A sample of clusters is taken
• Stage 2: Everyone in the cluster is sampled
Another type of probability sampling is a cluster sampling whereby a sample can be taken of a population that is widely disbursed across a large geographic area. This is logistically and economically more feasible than a random sample of a large geographic area as it reduces the transportation and logistic costs of traveling great distances to interview a subject. Clusters of subjects are randomly selected and then the units are interviewed.
Slide 43
Cluster Sampling
Source: http://faculty.elgin.edu/dkernler/statistics/
Slide 44
Cluster Sampling
Advantages
• Can generalize to a large population
• Representative of population
• Fewer costs than a simple random sample– Less travel
Disadvantages• Complex design• Requires geographic division
of sample frame into small clusters– Enumeration Areas
• Sample size can be larger than a simple random sample for the same precision
• Greater sampling error than a simple random sample– To reduce the sampling error, a
large number of clusters must be sampled
Slide 45
Ex. Of Cluster Sampling
Want to understand the portion of people in USA with access to treated piped water and sanitation services in the household
– Divide USA into clusters or census tracks
– Randomly sample clusters
– Interview households within each selected cluster
Map Source : Trochim, 2006
Slide 46
Probability Sampling
Multi-stage Cluster Sampling
– A multistage sampling in which groups (clusters) are sampled in the first stage with each selected group (cluster) sampled in the second stage and then those groups are sampled again in a third stage
– Population divided into clusters based on geography
A multi-stage cluster sampling combines some of the methods covered so far. When sampling methods are combined this is called multi-stage sampling.
Slide 47
• Stage 1: A sample of clusters is taken if relatively equal in size– Ex. A sample of villages taken
– If not equal in size, the clusters can be selected probability proportionate to size
• Stage 2: A sample of each selected cluster in stage 1 is taken– Ex. Villages are divided into smaller subgroups and a sample is
taken
• Stage 3: A sample of the cluster selected in stage 2 is taken– Ex. Households in the sample taken in stage 2 are listed and a
sample taken
Probability Sampling
For example, a multistage cluster sample, might sample clusters in the first stage, divide the clusters into smaller neighborhoods and take a sample in the second stage, and then take a sample of the households in the neighborhood in the third stage.
Slide 48
Multi-stage Cluster Sampling
Advantages
• Can generalize to the population
• Representative of population
• Fewer costs than a simple random sample– Less travel
• Sample size is larger than simple random sample for the same cost
Disadvantages
• Complex design
• Requires geographic division of sample frame into small clusters– Enumeration Areas
There are advantages and disadvantages to this method.
Slide 49
Multi-stage Cluster Sampling
Ex. WHO wants to facilitate a survey on WaSH at the household level in country x.
– Population divided into enumeration areas (EAs) based on geography
http://www.unicef.org/statistics/index_24302.html
Slide 50
• Stage 1: A sample of EAs is taken– If not equal in size, the EAs can be selected probability
proportionate to size
• Stage 2: If EAs are too large, a sample can be taken of the EAs.
• Stage 3: A sample of households in the selected EA from stage 2 is taken and the subjects are interviewed about WASH in the household– Households in the selected EA are listed before the sample is
taken
Multi-stage Cluster Sampling
http://www.unicef.org/statistics/index_24302.html
Slide 51
Multistage Cluster Sampling:
• A more sophisticated form of cluster sampling
• Goal: is to have each unit (number of households) to have an equal chance of selection
• If clusters are of differing sizes, give each cluster a chance of selection that is proportionate to its size (number of household)
Probability proportionate to size sampling is more sophisticated form of cluster sampling whereby the cluster sizes are different and are selected proportionate to size.
Slide 52
Sampling methods can be combined
Stratification in Multistage Cluster Sampling
• Step 1: Stratify your sample– Ex. Stratify by geography (interested in the rural and urban
portions of the population)
• Step 2: A sample of clusters is taken in each strata
• Step 3: A sample of the households in the clusters selected in stage 2 is taken– Ex. Households in the sample taken in stage 2 are listed before
the sample is taken
Sampling methods can be combined. If you want to understand how differences between groups within your population, you can divide the population into strata and sample each strata.
Slide 53
Bias
Source: Dogbert.com
Slide 54
Sources of Bias
• Change from sample plan
• Hard to reach units (subjects) are eliminated
• Replacement of units (subjects) with others
• Response rate is lower than calculated
• Sample frame is out of date or does not include all units (subjects)
There are sources of bias that every sampling method needs to consider and document when conducting the sample as it can bias the results.
Slide 55
Resources
• Babbie, E. (2013). The Practice of Social Research. 13th edition. Wadsworth Press.
• Estrella M. and Gaventa J. Who Counts Reality? Participatory Monitoring and Evaluation: A Literature Review. IDS Working Paper 70. International Workshop on Participatory Monitoring and Evaluation. International Institute for Rural Reconstruction. 1997. Pp1-27
Slide 56
Resources
• Fitzpatrick JL, Sanders JR, Worthen, BR. (2011). Program Evaluation: Alternative Approaches and Practical Guidelines. 4th Edition. Pearson, Allyn & Bacon. 2011. ISBN 10: 0205579353.
• Glanz, K., Rimer, B., Viswanath, K. (2008). Health Behavior and Health Education Theory, Research and Practice. (4th Edition). John Wile and Sons, Inc.
Slide 57
Resources
• Rossi, P., Libsey, M., Freeman, H., (2004).Evaluation, A Systematic Approach (7th
edition). Sage Publications.
• Trochim WMK, Donnelly JP. The Research Methods Knowledge Base (3rd Edition). Cengage Learning, 2008.
• Trochim. Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/index.php
Slide 58
Resources
• Trochim. Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/index.php