HR: Samples, Sampling, and Sample size

HR: SAMPLES, SAMPLING, AND SAMPLE SIZE

A practical guide

Samples, Sampling, and Sample Size Samples – Used in research (i.e. for estimation and

hypothesis testing), concerns theories around sampling and why we sample (i.e. sampling distributions).

Sampling- The process of taking samples, must guard against bias and threats to validity.

Sample size- The practical issue of how many subjects or units are needed for valid estimation or inference.

Process of SamplingInvolves:

1. Identification of study population (target)2. Determination of sampling population (sampling

frame)3. Definition of the sampling unit (individual, family,

etc.)4. Choice of sampling method (what is possible,

what is optimal)5. Estimation of the sample size (depends on study

question and study design)

Basic Questions about Sampling Why sample?

Efficiency and quality Who to sample?

Usually a representation of the population of interest How to sample?

Use the sampling method most appropriate. Number to sample?

As many as required to so potential sampling error is limited.

Why sample?To acquire information about larger populations Less costs Less field time When it’s impossible to study the whole population More accuracy -A Better Job of Data Collection (more time per

sample unit- higher quality data)

Who to sample?

Sampling is the process of selection of a number of units from a defined study population.

The study or target population is the one upon which the results of the study will be generalized.

It is crucial that the study population is clearly defined, since it is the most important determinant of the sampling population

Identification of study population

The Sampling Frame The sampling frame is the one from which

the sample is drawn. The definition of the sampling frame by the

investigator is governed by two factors: Feasibility: reachable sampling population External validity: the ability to generalize

from the study results to the target population.

The Sampling Unit

To define the sampling unit set: Inclusion criteria Exclusion criteria

May sample individuals, households, or larger units.

Consider unit of analysis: individual income, household income, city median income.

How to sample?

Non-probability sampling Probability sampling

Choices in sampling method

Non-probability sampling: Types of non probability sampling:

Convenience sampling (selected from elements of a population that are easily accessible)

Quota sampling (set number by type) Purposeful sampling (You chose who you think should be in the study) Snowball sampling (friend of friend….etc.)

Not recommended in health research if generalization or statistical analysis is intended: By far the most biased sampling procedure as it is not random (not everyone in the

population has an equal chance of being selected to participate in the study). Analytical/statistical procedures usually assume the sampled units came randomly from

the assumed statistical distribution.

Probability sampling“There is a known non-zero probability of selection

for each sampling unit” Types:

Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling Others:

Multi-stage random sampling Multi-phase sampling

Simple random sample In this method, all subject or elements have an

equal probability of being selected. There are two major ways of conducting a random sample.

The first is to consult a random number table, and the second is to have the computer select a random sample.

Enumeration required/assumed.

Systematic random sample A systematic sample is conducted by randomly

selecting a first case on a list of the population and then proceeding every Nth case until your sample is selected. This is particularly useful if your list of the population is long.

For example, if your list was the phone book, it would be easiest to start at perhaps the 17th person, and then select every 50th person from that point on.

Sampling fraction: Ratio between sample size and population size

Stratified sample In a stratified sample, we sample

either proportionately or equally to represent various strata or subpopulations.

For example if our strata were cities in a country we would make sure and sample from each of the cities. If our strata were gender, we would sample both men and women.

Cluster sampling Cluster: a group of sampling units close to

each other i.e. crowding together in the same area or neighborhood

In cluster sampling we take a random sample of strata and then survey every member of the group.

For example, if our strata were individuals schools in a city, we would randomly select a number of schools and then test all of the students within those schools.

Section 4Section 5

Section 3

Section 2Section 1

Cluster Samples of Households

Credit:Dr. Moataza Mahmoud Abdel WahabLecturer of BiostatisticsHigh Institute of Public HealthUniversity of Alexandria

More Complex Sampling MethodsMulti-stage sampling Multi-phase sampling

State County Town

Households

Person

Population

Sample:T1 Test 1

Test 2Sample:T2

Number to sample?

“How many subjects should be studied?”

The sample size depends on the following factors: I. Difference to be found II. Variability of the measurement III. Level of significance IV. Power of the study

Estimation of the sample size

Difference to detect “The magnitude of the difference to

be detected” A large sample size is needed to

detection a small difference. Thus, the sample size is inversely

related to the precision of difference needed to detect.

Variability of the measurement

The variability of measurements is reflected by the standard deviation or the variance.

The higher the standard deviation, the larger sample size is required.

Thus, sample size is directly related to the SD

Level of significance Relies on α error or type I error. The usual

level of α has been arbitrarily set to 5% or 0.05.

Alpha error can be minimized to 0.01 or even 0.001 but this consequently increases the sample size.

Thus, sample size is inversely related to the level of α error.

Alpha Error is considered before the study begins, but is only important when a significant difference or association is found.

Power of the study The power of the study is the probability that it will yield a

statistically significant result. It is related to β or type II error. Power is equal to (1- β), consequently the power of the study

is increased by decreasing the beta error. Thus, sample size is inversely related to the level of β error or

directly related to the power of the study.

Beta error is considered before the study begins, but is only of consequence when no difference of association is found (in hypothesis testing studies).

Beta error is not a consideration in surveys that are only estimating parameters (descriptive studies). Estimations are only concerned with confidence (i.e. confidence level) in the estimate.

Sample Size related to the Research Question, Design, and Analysis

The research question usually informs on: variables to be considered and level of measurements to be used. it also points to design type and analysis to be used.

Research type/design may address: Exploration, description, estimation (Descriptive Studies) Hypothesis testing of differences or relationships (Analytic Studies) Modeling of variables for relationships or survival (Multivariable Studies)

Sample size must consider the type/design plus the measurement level of the variables. Descriptive studies only ask how good is the estimate (and alpha error

question) Analytics studies must also consider Power (a Beta error question) Additional variables (three or more) normally require larger sample sizes to

maintain power in subgroups.

3. Interval level variable 1 sample:Where: 2 samples:Where:

4. Nominal level variable 1 sample:Where: 2 samples:Where:

Sample Size Determination: Calculations^

1. Interval level variablea. 1 sample:b. 2 samples:

2. Nominal level variablea. 1 sample:b. 2 samples:

Beta error not considered

For Confidence in an Estimation: Example: Survey data (descriptive)

For Hypothesis Testing:Example: analytic studies

2)n z

2)1()1(*

21

22

211

nn

snsn spooled

2)/)(1(n Ezpp

22211 )/)](1()1(n Ezpppp

^SEE: Sullivan, Lisa M. (2008). Essentials of Biostatistics in Public Health. Jones and Barlett, Sudbury Ma.

2i )*2n z

211

ESn

zz

211

ESn

zz

211

i ES2n

zz

211

i ES2n

zz

*

2 ES

0 ES

)1( ES

00

01

pppp

)1( ES 21

pppp

Four Research Questions1. What is the blood sugar level in college

students?2. What proportion of male and female college

students smoke?3. Are smoking levels in college students different

from the overall population?4. Are blood sugar levels in college students

different between males and females?

Question 1: What is the blood sugar level in college students?

Estimation/interval data/1 sample If 95% confidence needed, z= 1.96 Pilot survey estimate of standard deviation is 25 And, E (margin of error) is not to exceed 5mg/dl Then:

n= 96 2)n z

2)5/2596.1n x

Question 2: What proportion of male and female college students smoke?

Estimation/nominal data/2 samples If 95% confidence needed, z= 1.96 Pilot survey estimate of male p = .25; female p= .2 And, E (margin of error) is not to exceed 10% (.1) Then:

n= 95 (per group)

22211 )/)](1()1(n Ezpppp

2)1./96.1)](2.1(2.)25.1(25.n

Question 3: Are smoking levels in college students different from the overall population?

Hypothesis testing/nominal data/1 sample Set acceptable alpha at .05 (z1-a/2= 1.96); Power (1-B) at .8 (z1-

B=.84) Pilot survey estimate of college students p = .22; National

average p= .3 Then:

n= 2722

11

ESn

zz )1(

ES00

01

pppp

)3.1(3.3.22.

ES

2

.1784.96.1n

Question 4: Are blood sugar levels in college students different between males and females?

Hypothesis testing/interval data/2 samples Set acceptable alpha at .05 (z1-a/2= 1.96); Power (1-B) at .8 (z1-B=.84) Pilot survey estimate of females, mean = 95mg/dl, sd =10; males

mean= 100mg/dl, sd = 10 Then:

n= 63 (per group)

1010095

ES

211

i ES2n

zz

2 ES

2

.584.96.12n

Other sources on Samples and Sample Size: Many statistical programs have a sample size

generators. Example: “statcalc” utility in EpiInfohttp://www.cdc.gov/epiinfo/downloads.htm

Many web sites include sample size information:http://www.stat.uiowa.edu/~rlenth/Power/

Additional lecture materials on sampling and sample size:

http://www.pitt.edu/~super1/lecture/lec19041/index.htmhttp://www.pitt.edu/~super1/lecture/lec0542/index.htm

http://www.cdc.gov/epiinfo/downloads.htm

http://www.stat.uiowa.edu/~rlenth/Power/

http://www.stat.uiowa.edu/~rlenth/Power/

http://www.pitt.edu/~super1/lecture/lec19041/index.htm




HR: Samples, Sampling, and Sample size

Documents

Transcript of HR: Samples, Sampling, and Sample size