Quantitative Methods Topic 4 Sampling
-
Upload
oscar-melton -
Category
Documents
-
view
62 -
download
0
description
Transcript of Quantitative Methods Topic 4 Sampling
Quantitative Methods
Topic 4
Sampling
2
Outline
Populations and samples Sampling Frames Representativeness Probability and non-probability samples Example sampling methods
3
Reading on Sampling IIEP Module 3
Kenneth N Ross
TIMSS 2003 technical report – sampling chapter Pierre Foy and Marc Joncas
4
Populations and samples
Population eg all students at a school
Sample - small N selected to represent population
5
Why sampling
Study of part rather than the whole population
AdvantagesReduced cost
Generalisations about Estimates of characteristics
6
Population and Units of Analysis Defining the population
Without definition, we don’t have a context for the results
In an educational survey, the population will be defined by the units of analysis which may be: the student (eg studies of attainments) the teachers (eg studies of teaching practice) the school (eg studies in school environment)
Each unit of analysis may require a different sampling strategy.
7
Populations
Desired: population for which the results are ideally required
Defined: population which is actually studied,
Excluded: The elements that are excluded from the desired target population in order to form the defined target population
8
Sampling Frames A listing of the elements in a population.
E.g., School’s enrolment records can readily provide a sampling frame for the population of students: all students are listed, and each student listed only once.
9
Representativeness
A sample is considered as representative if certain percentage of frequency distributions of some characteristics within the sample data are similar to those within the whole population
The population characteristics selected for comparisons are called “marker variables”:
In education, common marker variables are sex, age, SES, school types, location, school size, ethnicity.
10
Sample types Probability:
Each member of the defined target population has a known and non zero chance of being selected into the sample
Estimating the values of population parameters from sample parameters
Testing statistical hypothesis about population from samples
Non- Probability: It is not possible to determine whether a non-probability
sample is likely to provide very accurate or inaccurate estimates of population parameters.
11
Types of non-probability samples Judgement sampling
Base on researchers’ judgment Convenience sampling
Subjects or elements of a sample were selected base on their accessibility to the researcher
Quota sampling Number of elements (subjects) are drawn from various target
population strata in proportion to the size of these strata Little or no control over the procedures used to select elements
within these strata There is no way of checking the accuracy of estimates
12
Types of probability samples
Simple random sampling Systematic sampling Stratified sampling Multistage Cluster Sampling
13
Simple Random Sampling
There is a single sampling frame or list of names
A sample is selected from the list in a single operation e.g. list of students in a faculty used to select a
sample for course evaluation
14
Golden Rule of Simple Random Sampling
• Each member of the population shall have an equal chance of selection.
15
Class activity 1: Using SPSS to take a random sample
The data file VNsample.sav contains ID of all students of a district. Draw a simple random of 10% of the population as follows:
Click DATA -> SELECT CASES -> RANDOM
SAMPLE -> SAMPLE-APPROXIMATELY 10% CONTINUE
Click on ‘copy selected cases to a new data set’. In the box type the new data set name. Click OK.
16
Class activity 1 Examine the new data set to see how
many students were randomly selected. Calculate frequencies of girls and boys.
Compare with the main sample. Calculate mean mathematics achievement
(variable pma500). Compare with the results of the main sample.
Repeat the selection three times to draw different samples and check how results vary.
17
Systematic Random Sampling Example
To draw a systematic random sample of size 16 from our list of Metropolitan schools (160 schools), ordered by school number, we would Calculate the sampling interval (160/16 = 10) Draw a random number between 1 and 10 (say it
is 7) The sample will then consist of the following 16
schools from the list: the 7th, the (7 + 10 = 17)th, the (7 + 210 = 27)th and so on to the (7 + 1510 = 157th)
Note that the number of different samples that can be drawn by systematic sampling is typically quite small (10 in this example)
18
Systematic Random Sampling(Random Start Sampling Interval) Work out sampling interval. Select a random start. Every qth element in the register is selected
from the random start May be more efficient than Simple Random
Sampling, e.g. when there is a systematic relation between the population order and the response variable(s) (i.e. give estimates of greater precision than a SRS of the same size)
May result in a biased sample if there is pattern in the list.
19
The list of schools that we have been working with is largely, but not completely, arranged in alphabetical order of the school name.
It is unlikely that, here, the order would be related to the N of students studying Psychology. Hence, it is unlikely that the precision of the sample would be superior to that obtained from a simple random sample
The list could, however, be sorted by a variable that we might expect was related to the N of Psychology students (e.g. school or Year 12 cohort size)
In this case, we would expect the precision of estimates from the sample to improve for any characteristic related to N of Psychology students.
20
Class Activity 2: Draw RSSI sample
Open METROSCHOOLS.SAV in SPSS. DATA EDITOR WINDOW -> DATA-SORT CASES; put
yr12en into the box sort by then OK.
In the DATA EDITOR window examine the variable yr12en across a few cases: comment.
Draw a RSSI 10% sample. does the sample represent the full range of
school sizes (year 12 enrolment) ? Why?
21
Class Activity 2: Dangers of RSSI
A. British electoral registers are lists of street addresses in street number order. Even numbers are on one side of the street, odd numbers on the other.
With RSSI, and an even sampling interval (eg 20, 22 or 24) how many sides of any street will you sample?
B. You are using RSSI to draw a sample from a list of club members, in alpha order with many married couples listed in male female order… any problems?
22
Stratified Sampling (1)
The target population is divided into non-overlapping sub-populations called strata
Sampling is performed independently within strata
23
Types of stratified sampling Proportionate
the within-stratum sample size is calculated such that it is proportional to the size of the sub-population
Disproportionate Uses different sampling fractions within the various
strata Is used in order to ensure that the accuracy of
sample estimates obtained for stratum parameters is sufficiently high to be able to make meaningful comparisons between strata
24
Analysis of Disproportionate Stratified Sample Weighting is required to analyze the full sample. Weighting is not required to analyze strata
separately Post-stratification can be used to weight a
sample to know population characteristics after selection and/or after data collected. EG Post stratify by age or ethnic background
25
Stratified Sampling (2) Provides increased accuracy in sample estimates
without leading to increases in costs Can guarantee representation of small sub-populations
in the sample Many population frames are readily divided into sub-
populations - e.g. into States and Systems (government, Catholic, private) in national education surveys; into States and residential location (rural/urban) in health or employment surveys
In some studies stratification is used for reasons other than obtaining gains in accuracy Strata may be formed in order to employ different
sample design within strata. Subpopulations defined by the strata are designated
as separate domains of study.
26
Multi-Stage Cluster Sampling Used where there are naturally formed groups of
population elements (e.g. schools, households, community health centres etc.) and, frequently,
Used when a full population frame is not available (e.g. all students in all Government schools in Australia, all patients seen by medical staff in all community health centres in Victoria)
In face to face interview studies: When the sample is geographically dispersed and the costs of travel would otherwise be prohibitive.
Enables the researcher to gather data from within the sampled clusters only, and thus lowers the cost of a survey.
27
An Example of Cluster Sampling
Primary Sampling Stage: Select a number of schools in Victoria (it can be done using simple random sampling or stratified sampling of Government, Roman Catholic, and ‘Private’ schools in Victoria
Stage 2: A sample of Year 12 students is then drawn from the enrolment records of each of the sampled schools
Note that a full list of all Year 12 students in all Victorian schools is not needed. All that is required is a list of students for each of the sampled schools
28
At least TWO stages in MS sampling Stage One: select primary sampling units, eg
schools, electorates, local authority areas, community health centres
Stage Two: select secondary sampling units: eg pupils within schools, patients within community health centres
Note that each stage is a separate sampling operation, and these operations need not be uniform: may use stratification may use RSSI sampling
29
Accuracy of estimates from samples
The degree of accuracy of sample estimates may be judged by the difference between the sample estimates and the value of the population parameters
In most situations, the value of population parameters are unknown.
It is possible to estimate the probable accuracy of the obtained sample through a knowledge of the behaviour of estimates derived from all possible samples.
30
A simple example Students are given two packs of cards,
combined into one deck. They are asked to guess the proportion of cards that are red.
Students then draw a sample of 10 cards, and use it to make the estimate. This is repeated several times to draw several samples of 10.
Similarly, students draw repeated samples of 50.
Results are graphed to form two sampling distributions: one for samples of 10, the other for samples of 50. (example on next slide).
Which sample size: 10 or 50, will give the most accurate result?
31
Class Activity 3 : Sampling Task
Use the EXCEL procedure as for Week 5 for simulating drawing of cards.
32
Sampling Distribution: estimates of % cards red from several samples
Number in each sample: Number of samples drawn Estimate from first sample Average for all samples:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0
1
2
3
4
5
6
7
8
9
10
10 2550%
47.6%
Percent of red cards in sample
33
Sampling Distribution: estimates of % red cards from several samplesNumber in each sample: Number of samples drawn
Estimate from first sample Average for all samples:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0
1
2
3
4
5
6
7
8
9
10
50 2552..850%
Percent of red cards in sample
34
Difference in the distributions samples of 10
have a higher Standard Deviation (SD)have a more dispersed distribution the estimates from individual samples vary
greatly samples of 50
have a lower SDhave a less dispersed distribution the estimates from individual samples vary
less
35
Factors affecting accuracies of estimates from samples Sample size, as seen from the above
example. Other factors:
Sampling design Stratified and Systematic sampling may increase
accuracy. Cluster sampling may reduce accuracy.
36
Clustering sampling in education
Schools are selected first. Then, students are selected from the
selected schools. If there is a large “intraclass correlation”,
precision of estimates will be reduced.
37
Intra-class correlation In the context of schools/students: The degree to which students are similar
within schools. Large intra-class correlation:
Schools are highly tracked. High ability students are in the same schools. Low ability students are in other schools.
Low intra-class correlation:The range of abilities of students is about the
same in all schools.
38
Effect of cluster sampling If intra-class correlation is high, then we
need to select more schools to get the variations of student abilities.
In the extreme case, if all students within each school have the same ability, then sampling all students from one school is equivalent to sampling just one student.
Our estimate from the sample will be quite imprecise as compared with the population parameter. (“loss of sampling efficiency”)
39
(Sampling) Design Effect
Defined as the loss of efficiency from sampling
If n1 students are required to achieve the same precision as for n2 students from a simple random sample, then the design effect is n1/n2.
40
Table of intra-class correlation - 1 Table 1.1 in Reading ‘IIEPModule3.pdf’ For example, if we sample 20 students
from each school (cluster size of 20), and the intra-class correlation is around 0.2, then the design effect is 4.8
This means that, if cluster sampling is used, we need a sample size 4.8 times larger than the sample size for a simple random sample.
In Australia, the intra-class correlation is around 0.2, or a little higher.
41
Table of design effects
See document ‘DesignEffectPISA2003.xls’(data from PISA 2003 technical report)
42
Computation of sampling error
More complicated when multi-stage cluster sampling is used.
Can be estimated once the intra-class correlation is known (say, from previous studies)