HR: Samples, Sampling, and Sample size
description
Transcript of HR: Samples, Sampling, and Sample size
HR: SAMPLES, SAMPLING, AND SAMPLE SIZE
A practical guide
Samples, Sampling, and Sample Size Samples – Used in research (i.e. for estimation and
hypothesis testing), concerns theories around sampling and why we sample (i.e. sampling distributions).
Sampling- The process of taking samples, must guard against bias and threats to validity.
Sample size- The practical issue of how many subjects or units are needed for valid estimation or inference.
Process of SamplingInvolves:
1. Identification of study population (target)2. Determination of sampling population (sampling
frame)3. Definition of the sampling unit (individual, family,
etc.)4. Choice of sampling method (what is possible,
what is optimal)5. Estimation of the sample size (depends on study
question and study design)
Basic Questions about Sampling Why sample?
Efficiency and quality Who to sample?
Usually a representation of the population of interest How to sample?
Use the sampling method most appropriate. Number to sample?
As many as required to so potential sampling error is limited.
Why sample?To acquire information about larger populations Less costs Less field time When it’s impossible to study the whole population More accuracy -A Better Job of Data Collection (more time per
sample unit- higher quality data)
Who to sample?
Sampling is the process of selection of a number of units from a defined study population.
The study or target population is the one upon which the results of the study will be generalized.
It is crucial that the study population is clearly defined, since it is the most important determinant of the sampling population
Identification of study population
The Sampling Frame The sampling frame is the one from which
the sample is drawn. The definition of the sampling frame by the
investigator is governed by two factors: Feasibility: reachable sampling population External validity: the ability to generalize
from the study results to the target population.
The Sampling Unit
To define the sampling unit set: Inclusion criteria Exclusion criteria
May sample individuals, households, or larger units.
Consider unit of analysis: individual income, household income, city median income.
How to sample?
Non-probability sampling Probability sampling
Choices in sampling method
Non-probability sampling: Types of non probability sampling:
Convenience sampling (selected from elements of a population that are easily accessible)
Quota sampling (set number by type) Purposeful sampling (You chose who you think should be in the study) Snowball sampling (friend of friend….etc.)
Not recommended in health research if generalization or statistical analysis is intended: By far the most biased sampling procedure as it is not random (not everyone in the
population has an equal chance of being selected to participate in the study). Analytical/statistical procedures usually assume the sampled units came randomly from
the assumed statistical distribution.
Probability sampling“There is a known non-zero probability of selection
for each sampling unit” Types:
Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling Others:
Multi-stage random sampling Multi-phase sampling
Simple random sample In this method, all subject or elements have an
equal probability of being selected. There are two major ways of conducting a random sample.
The first is to consult a random number table, and the second is to have the computer select a random sample.
Enumeration required/assumed.
Systematic random sample A systematic sample is conducted by randomly
selecting a first case on a list of the population and then proceeding every Nth case until your sample is selected. This is particularly useful if your list of the population is long.
For example, if your list was the phone book, it would be easiest to start at perhaps the 17th person, and then select every 50th person from that point on.
Sampling fraction: Ratio between sample size and population size
Stratified sample In a stratified sample, we sample
either proportionately or equally to represent various strata or subpopulations.
For example if our strata were cities in a country we would make sure and sample from each of the cities. If our strata were gender, we would sample both men and women.
Cluster sampling Cluster: a group of sampling units close to
each other i.e. crowding together in the same area or neighborhood
In cluster sampling we take a random sample of strata and then survey every member of the group.
For example, if our strata were individuals schools in a city, we would randomly select a number of schools and then test all of the students within those schools.
Section 4Section 5
Section 3
Section 2Section 1
Cluster Samples of Households
Credit:Dr. Moataza Mahmoud Abdel WahabLecturer of BiostatisticsHigh Institute of Public HealthUniversity of Alexandria
More Complex Sampling MethodsMulti-stage sampling Multi-phase sampling
State County Town
Households
Person
Population
Sample:T1 Test 1
Test 2Sample:T2
Number to sample?
“How many subjects should be studied?”
The sample size depends on the following factors: I. Difference to be found II. Variability of the measurement III. Level of significance IV. Power of the study
Estimation of the sample size
Difference to detect “The magnitude of the difference to
be detected” A large sample size is needed to
detection a small difference. Thus, the sample size is inversely
related to the precision of difference needed to detect.
Variability of the measurement
The variability of measurements is reflected by the standard deviation or the variance.
The higher the standard deviation, the larger sample size is required.
Thus, sample size is directly related to the SD
Level of significance Relies on α error or type I error. The usual
level of α has been arbitrarily set to 5% or 0.05.
Alpha error can be minimized to 0.01 or even 0.001 but this consequently increases the sample size.
Thus, sample size is inversely related to the level of α error.
Alpha Error is considered before the study begins, but is only important when a significant difference or association is found.
Power of the study The power of the study is the probability that it will yield a
statistically significant result. It is related to β or type II error. Power is equal to (1- β), consequently the power of the study
is increased by decreasing the beta error. Thus, sample size is inversely related to the level of β error or
directly related to the power of the study.
Beta error is considered before the study begins, but is only of consequence when no difference of association is found (in hypothesis testing studies).
Beta error is not a consideration in surveys that are only estimating parameters (descriptive studies). Estimations are only concerned with confidence (i.e. confidence level) in the estimate.
Sample Size related to the Research Question, Design, and Analysis
The research question usually informs on: variables to be considered and level of measurements to be used. it also points to design type and analysis to be used.
Research type/design may address: Exploration, description, estimation (Descriptive Studies) Hypothesis testing of differences or relationships (Analytic Studies) Modeling of variables for relationships or survival (Multivariable Studies)
Sample size must consider the type/design plus the measurement level of the variables. Descriptive studies only ask how good is the estimate (and alpha error
question) Analytics studies must also consider Power (a Beta error question) Additional variables (three or more) normally require larger sample sizes to
maintain power in subgroups.
3. Interval level variable 1 sample:Where: 2 samples:Where:
4. Nominal level variable 1 sample:Where: 2 samples:Where:
Sample Size Determination: Calculations^
1. Interval level variablea. 1 sample:b. 2 samples:
2. Nominal level variablea. 1 sample:b. 2 samples:
Beta error not considered
For Confidence in an Estimation: Example: Survey data (descriptive)
For Hypothesis Testing:Example: analytic studies
2)n z
2)1()1(*
21
22
211
nn
snsn spooled
2)/)(1(n Ezpp
22211 )/)](1()1(n Ezpppp
^SEE: Sullivan, Lisa M. (2008). Essentials of Biostatistics in Public Health. Jones and Barlett, Sudbury Ma.
2i )*2n z
211
ESn
zz
211
ESn
zz
211
i ES2n
zz
211
i ES2n
zz
*
2 ES
0 ES
)1( ES
00
01
pppp
)1( ES 21
pppp
Four Research Questions1. What is the blood sugar level in college
students?2. What proportion of male and female college
students smoke?3. Are smoking levels in college students different
from the overall population?4. Are blood sugar levels in college students
different between males and females?
Question 1: What is the blood sugar level in college students?
Estimation/interval data/1 sample If 95% confidence needed, z= 1.96 Pilot survey estimate of standard deviation is 25 And, E (margin of error) is not to exceed 5mg/dl Then:
n= 96 2)n z
2)5/2596.1n x
Question 2: What proportion of male and female college students smoke?
Estimation/nominal data/2 samples If 95% confidence needed, z= 1.96 Pilot survey estimate of male p = .25; female p= .2 And, E (margin of error) is not to exceed 10% (.1) Then:
n= 95 (per group)
22211 )/)](1()1(n Ezpppp
2)1./96.1)](2.1(2.)25.1(25.n
Question 3: Are smoking levels in college students different from the overall population?
Hypothesis testing/nominal data/1 sample Set acceptable alpha at .05 (z1-a/2= 1.96); Power (1-B) at .8 (z1-
B=.84) Pilot survey estimate of college students p = .22; National
average p= .3 Then:
n= 2722
11
ESn
zz )1(
ES00
01
pppp
)3.1(3.3.22.
ES
2
.1784.96.1n
Question 4: Are blood sugar levels in college students different between males and females?
Hypothesis testing/interval data/2 samples Set acceptable alpha at .05 (z1-a/2= 1.96); Power (1-B) at .8 (z1-B=.84) Pilot survey estimate of females, mean = 95mg/dl, sd =10; males
mean= 100mg/dl, sd = 10 Then:
n= 63 (per group)
1010095
ES
211
i ES2n
zz
2 ES
2
.584.96.12n
Other sources on Samples and Sample Size: Many statistical programs have a sample size
generators. Example: “statcalc” utility in EpiInfohttp://www.cdc.gov/epiinfo/downloads.htm
Many web sites include sample size information:http://www.stat.uiowa.edu/~rlenth/Power/
Additional lecture materials on sampling and sample size:
http://www.pitt.edu/~super1/lecture/lec19041/index.htmhttp://www.pitt.edu/~super1/lecture/lec0542/index.htm