On Samples And Sampling

77
“On Samples And Sampling” Title drawn from Elisabeth Kubler-Ross’ anagramatic phraseologies: “On death and dying”; “On grief and grieving”; “Real taste of life (On life and living)Ehi Igumbor School of Public Health University of the Western Cape

description

Learning Objectives: 1. Understand strategies for selecting a sample 2.Understand how to determine the required sample size

Transcript of On Samples And Sampling

Page 1: On Samples And Sampling

“On Samples And Sampling”

Title drawn from Elisabeth Kubler-Ross’ anagramatic phraseologies: “On death and dying”; “On grief and grieving”; “Real taste of life (On life and living)”

Ehi IgumborSchool of Public Health

University of the Western Cape

Page 2: On Samples And Sampling

“On Samples”What is Research all about? Gathering data, information or

evidence about a subject or topic What is the outcome of HIV-associated

tuberculosis in the era of HAART?

But how much data, information or evidence? Should every HIV-associated TB case on

HAART be used? Should only 2 be left out? Should 50% of them be used?

Page 3: On Samples And Sampling

Populations and Samples

Population Larger group to which research results are

generalized Defined aggregate of persons, objects, or

events that meet a specified set of criteria

Sample Sub-group of the population Serves as reference group for estimating

characteristics of or drawing conclusions about the population

Page 4: On Samples And Sampling

Populations and Samples

Population (Group about whom you wish to gather data defined by person, place and time)

Sample(Sub-group of total study population)

Page 5: On Samples And Sampling

Why use a sample? Save time Save money Save energy Not practical to get everyone Less data so limit error (fewer

opportunities to make mistake)– improved quality

Why Not? - Just as good!

Page 6: On Samples And Sampling

But… Sampling Bias!

Are responses of sample members representative of the population?

No way to guarantee, but good sampling procedures help

Not so much size as representativity: Gallup and Harris polls predicted Nixon win using

2000 voters (43% predicted, 42.9% result) 1936 Literary Digest poll predicted Alf Landon win by

57% based on 2million voters from list of automobile owners and telephone directories

Page 7: On Samples And Sampling

Sampling Bias

Occurs when individuals selected over- or under- represent certain population attributes that are related to the phenomenon under study

May be Conscious or Unconscious

Page 8: On Samples And Sampling

Learning Objectives

Understand strategies for selecting a sample

Understand how to determine the required size of a sample

Page 9: On Samples And Sampling

“On Sampling” –Determining Sampling Procedure What do I want to know?

Does self-reported quality of life of patients with HIV-associated tuberculosis improve after HAART compared to before HAART?

Is the CD4 count in patients on HAART different from those not on HAART?

May involve simply comparing 2 indicators or more rigorous analysis of changes in HAART and not in HAART to estimate the strength of the impact of HAART

Page 10: On Samples And Sampling

Determining Sampling Procedure

What is my Population? Need a good problem statement Everyone affected (may be geographical,

demographical, economic, social, or other specific content of study)

Should not be too narrow Sometimes source of data is different from

sampling unit e.g household surveys

Page 11: On Samples And Sampling

Determining Sampling Procedure

Remember Populations are not necessarily restricted human subjects: May include people, places, organizations,

objects, animals, days or any unit of interest. E.g Blood samples in an epidemiology study Housing units in a household survey Series of measurements in a test-retest reliability

study Inventory of manufactured products in industrial

quality control studies

Page 12: On Samples And Sampling

Target Population and Accessible Population

Study of motor skills Target or reference population:

“ALL children with learning disabilities in South Africa today”

Accessible or experimental population “ALL children identified as having a learning

disability in Cape Town’s school system”

Page 13: On Samples And Sampling

Inclusion and Exclusion Criteria

Inclusion Criteria: primary traits of the target and accessible populations that will qualify someone as a subject

Exclusion Criteria: factors that would preclude someone from being studied. (Are potentially confounding to the results)

Page 14: On Samples And Sampling

Determining Sampling Procedure

To sample or Not to sample? Is it feasible to use population? ?Cost ?Time Sometimes “census” of all needed

Small population size Useful to know information on every individual Scope of study: rapid assessment or in-depth

investigation

Page 15: On Samples And Sampling

Types of Samples

Page 16: On Samples And Sampling

Sampling Procedure Non-probability

Selection of samples is made by nonrandom methods i.e not based on chance

No way to accurately estimate chance of inclusion/degree of sampling error

Is convenient and economical Quality depends on knowledge, judgment

and expertise of researcher

Page 17: On Samples And Sampling

Non-Probability Samples

Haphazard Sampling

No conscious planning or consistent procedures are employed to select the sample units

Page 18: On Samples And Sampling

Non-Probability Samples

Convenience or “accidental” Sampling

A unit is self-selected (e.g volunteers) or easily accessible/available

E.g consecutive sampling of patients Although may yield useful information, caution with

making inferences!

Page 19: On Samples And Sampling

Non-Probability Samples

Quota Sampling

A pre-determined number of units which have certain characteristics are selected

Controls for confounding effect of known characteristics of a population by selecting adequate numbers from each stratum

E.g “50 men and 50 women to be interviewed on a busy street”

Page 20: On Samples And Sampling

Non-Probability SamplesSnowball Samples

Useful if hard to locate subjects with specific characteristics

Carried in stages: Select a few subjects who meet selection criteria Ask selected subjects to identify others who have requisite

characteristics Repeat process of “chain referral” or “snowballing” till

adequate sample size obtained

Page 21: On Samples And Sampling

Non-Probability SamplesPurposive or judgment Sampling

Researcher handpicks subjects on basis of specific characteristics or attributes that are important to the research study

Units used sometimes EXTREME or CRITICAL units May be most useful to pre-test an instrument for a larger

study or in qualitative studies to ensure subjects have appropriate knowledge and will be good informants for the study

Page 22: On Samples And Sampling

Probability Samples Every element in the population has a known, nonzero

probability of selection

Because probability is known, can be generalized (at least within a given level of precision) to the larger population

Risk of incorrectly generalizing to larger population less, thus better than non-probability samples

Page 23: On Samples And Sampling

Sampling Frame A list of units or elements from which the sample is

to be selected Should list every element separately, once and

only once, and nothing else appears on the list Common Problems:

Missing elements, non-coverage or incomplete frame Blanks or foreign elements Duplicate listings Clusters of elements combined into one listing

Page 24: On Samples And Sampling

Sampling Frame

Page 25: On Samples And Sampling

What do you do if a “poor” Sampling Frame?

BEFORE SELECTING SAMPLE:

Ignore or disregard the problem

Redefine population to fit sampling frame

Spend time and effort to fix the frame

Page 26: On Samples And Sampling

What do you do if a “poor” Sampling Frame?

Missing elements: Use supplementary methods. Eg active fieldwork to get homeless individuals in a household based survey

Foreign elements: Omit if identified

Duplicate elements: Select first, last, current listing Any unique feature?

Clusters: Use all. Or randomly select one

Page 27: On Samples And Sampling

Probability Samples- Simple Random

Easiest and least complex Equal chance for each element Using table of random numbers:

Assign a number to each element in list Select a starting point Determine number of columns to use Select numbers from table Discard any duplicate you select Select numbers until obtain desired sample size

Page 28: On Samples And Sampling

Probability Samples- Simple Random

Page 29: On Samples And Sampling

Probability Samples- Stratified Random

Improves on estimates of simple random by random sampling population in strata

3 types: Proportionate Disproportionate or Optimal Equal size

Page 30: On Samples And Sampling

Probability Samples- Stratified Random

Page 31: On Samples And Sampling

Probability Samples- Systematic Samples

Select first element randomly and then every nth element on the list afterwards

Starting point will be a number between 1 and 10 randomly drawn from a table of random numbers

Gives each element equal (but not independent) chance

Useful if you do not have a list when elements are arranged in space e.g house selection

Page 32: On Samples And Sampling

Probability Samples- Systematic Samples

Page 33: On Samples And Sampling

Probability Samples- Cluster or Area Sample

A method of selecting sample units in which the unit contains a cluster of elements

The probability of selecting an element is a product of the probabilities of selecting its cluster

Different from stratified in that ideally, elements are heterogenous. (In stratified they are homogenous)

NB: In practice though, clusters tend to be homogenous

Page 34: On Samples And Sampling

Probability Samples- Cluster or Area Sample

Page 35: On Samples And Sampling

PUTTING IT TOGETHER- SELECTING A SAMPLING DESIGN

Multi-faceted process Depends on

Amount of information available about population If characteristics known – stratified random If little known – less complex simple or systematic When list unavailable – cluster ALSO combined: Stratified multi-staged

cluster sampling

Page 36: On Samples And Sampling

Determine the type of sampling used

A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.

Page 37: On Samples And Sampling

Determine the type of sampling used

A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.

Stratified

Page 38: On Samples And Sampling

Determine the type of sampling used

A pollster interviews all human resource personnel in five different high tech companies.

Page 39: On Samples And Sampling

Determine the type of sampling used

A pollster interviews all human resource personnel in five different high tech companies.

Cluster

Page 40: On Samples And Sampling

Determine the type of sampling used

An engineering researcher interviews 50 women engineers and 50 men engineers.

Page 41: On Samples And Sampling

Determine the type of sampling used

An engineering researcher interviews 50 women engineers and 50 men engineers.

Stratified

Page 42: On Samples And Sampling

Determine the type of sampling used

A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.

Page 43: On Samples And Sampling

Determine the type of sampling used

A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.

Systematic

Page 44: On Samples And Sampling

Determine the type of sampling used

A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.

Page 45: On Samples And Sampling

Determine the type of sampling used

A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.

Simple random

Page 46: On Samples And Sampling

Determine the type of sampling used

A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.

Page 47: On Samples And Sampling

Determine the type of sampling used

A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.

Convenience

Page 48: On Samples And Sampling

Suppose UWC has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in an academic year. Asking all 10,000 students is an almost impossible task. Suppose we take two different samples.

Page 49: On Samples And Sampling

First, we use convenience sampling and survey 10 students from a first semester Masters in Public Health class. Many of these students have been attending the 2009 Summer School and taking elective course on Epidemiology and biostatistics in addition to their MPH core courses . The amount of money they spend is as follows:R128; R87; R173; R116; R130; R204; R147; R189; R93; R153

Page 50: On Samples And Sampling

The second sample is taken by using a list from the Division of Life Long Learning unit of adult learners who take part-time classes and taking every 5th student on the list, for a total of 10 students. They spend:

R50; R40; R36; R15; R50; R100; R40; R53;

R22; R22

Page 51: On Samples And Sampling

Problem 1

Do you think that either of these samples is representative of (or is characteristic of) the entire10,000 part-time student population?

Page 52: On Samples And Sampling

Problem 2

Since these samples are not representative of the entire population, is it wise to use the results to describe the entire population?

Page 53: On Samples And Sampling

Now, suppose we take a third sample. We choose ten different part-time students from all disciplines which offer part-time studies (Public Health, Physio, EMS, etc). Each student is chosen using simple random sampling. Using a calculator, random numbers are generated and a student from a particular discipline is selected if he/she has a corresponding number. The students spend:

R180; R50; R150; R85; R260; R75; R180; R200; R200; R150

Page 54: On Samples And Sampling

Do you think this sample is representative of the population?

Problem 3

Page 55: On Samples And Sampling

Learning Objectives

Understand strategies for selecting a sample

Understand how to determine the required size of a sample

Page 56: On Samples And Sampling

Sample Size Determination

Determined by: Purpose of study Population size Risk of selecting a “bad” sample Allowable sampling error

Page 57: On Samples And Sampling

Sample Size Criteria

Level of precision

Level of confidence or risk

Degree of variability

Page 58: On Samples And Sampling

Level of Precision

Also called “Sampling error”

Range in which the true value of the population is estimated to be

So, 42% (+/- 2%): 40% - 44%

Page 59: On Samples And Sampling

Confidence Level

Also called “Risk level”

Based on principle of Central Limit Theorem

95% CI – 95 out of 100 samples will have the true population value within the range of precision specified

Page 60: On Samples And Sampling

Confidence Level

Chance that sample you obtain does not represent the true population value is shown in shaded area

Risk reduces for 99% CI and increases for 90% CI

Page 61: On Samples And Sampling

Degree of Variability Distribution of attributes

Heterogenous – bigger sample Homogenous – smaller sample

Note that 50% indicates a greater level of variability than 20% and 80%

0.5 is mostly used in conservative samples because it indicates maximum variability

Page 62: On Samples And Sampling

Strategies for determining Sample Size

Using a Census for small populations

Using a Sample Size of a Similar Study

Using Published Tables

Using Formula to Calculate a Sample Size

Page 63: On Samples And Sampling

Using a Census for small populations

Use entire population as sample May be useful in Small population cost

permitting (<200) Why use this?

Eliminates sampling error Provides individual level data “Fixed costs” eg of questionnaire design etc Virtually entire population would have to be in

sample in small populations anyway

Page 64: On Samples And Sampling

Using a Sample Size of a Similar Study

Could be a valuable approach

But without reviewing the procedures employed, may run risk of repeating errors made previously

Review literature to get guidance on “typical” sample size

Page 65: On Samples And Sampling

Using Published Tables Use published tables which provide sample

size for a given set of criteria

Sample sizes in tables reflect the number of OBTAINED responses (not necessarily the number of surveys mailed)

Assumptions of normality in distribution

Page 66: On Samples And Sampling
Page 67: On Samples And Sampling
Page 68: On Samples And Sampling

Using Formulas to Calculate A Sample Size

Equation 2: (Snedecor & Cochran 1989)

22

2

dd

qpqpCn eecc

Equation 1: (Fleiss 1981)

2

21

d

sCn

2)(1*

eN

Nn

Equation 3: (Yamane’s 1967)

Page 69: On Samples And Sampling

Other Considerations Assumes simple random sampling

Number needed for data analysis (eg multiple regression analysis, log linear analysis require a bigger sample than if simple descriptive analysis)

Sample size increased by 30% to compensate for non-response; 10% to compensate for persons unable to reach

Page 70: On Samples And Sampling

Calculation Using Computer Programmes

Epi Info

Online Softwares: eg Rao Soft

Page 71: On Samples And Sampling

EXAMPLE: Sample Size Calculation

2)(1*

eN

Nn

Where n = Sample size N = Population size e = Level of precision or Sampling of Error which is ±5%

Yamane’s formula:

*Reference: Yamane, Taro. 1967. Statistics, An Introductory Analysis,2nd Ed.

New York: Harper and Row.

Page 72: On Samples And Sampling

Eastern Cape 1 6 2 45 10 4 1 18 665 31 16 783Free State 1 0 5 25 0 1 0 0 231 30 14 293Gauteng 4 0 11 8 3 4 0 0 323 30 93 383Northern Cape 0 1 1 20 3 0 0 0 83 16 5 124KwaZulu Natal 1 2 13 43 7 4 0 0 524 16 24 610North West 0 5 0 26 0 2 0 0 310 55 7 398Mpumalanga 0 2 3 23 5 0 0 0 209 38 6 280Limpopo 0 2 6 32 1 2 0 0 430 26 3 499Western cape 3 0 8 22 2 5 0 15 358 72 33 485Total 10 18 49 244 31 22 1 33 3133 314 201 3855

Dis

tric

t

Tot

al N

o. H

ealth

F

acili

ties

Pro

vinc

ially

A

ided

Pub

lic/P

rivat

e cl

inic

Spe

cial

ised

- T

B

Spe

cial

ised

-P

sych

iatr

icS

peci

aliz

ed-

Ort

hopa

edic

Nat

iona

l Cen

tral

Pro

vinc

ial

Ter

tiary

Reg

iona

l

chc

priv

ate

hosp

ital

# of Health Facilities per Province

Source: Digital Healthcare Solutions (PTY) LTD. Comprehensive Health Services Information for Southern Africa:

Hospital & Nursing YearBook, 2007.

Page 73: On Samples And Sampling

Sample Size Calculation:  

Total number of health facilities in the study: 350

*Reference: Yamane, Taro. 1967. Statistics, An Introductory Analysis,2nd Ed.

New York: Harper and Row.

350)(1

*2

eN

Nn

Page 74: On Samples And Sampling

Sampling Techniques

Multi-Stage Sampling Primary sampling unit Stratification by district (Selection Bias)

Levels of Care Rural/Urban

Sample Proportional Size Sampling Weight:

Page 75: On Samples And Sampling

Total # of health facilities Weighted Sample

Eastern Cape 783 71

Free State 293 27

Gauteng 383 35

Northern Cape 124 11

KwaZulu-Natal 610 55

North West 398 36

Mpumalanga 280 25

Limpopo 499 45

Western cape 485 44

Total 3855 350

Sampling Techniques

Page 76: On Samples And Sampling

Eastern Cape 1 2 1 5 1 1 1 4 47 3 3 1 70Free State 1 0 1 2 0 1 0 0 16 2 3 1 27Gauteng 1 0 2 1 1 1 0 0 18 2 3 8 36Northern Cape 0 1 1 3 1 0 0 0 2 1 1 1 11KwaZulu Natal 1 1 2 4 1 1 0 0 40 2 1 2 55North West 0 1 0 2 0 1 0 0 25 1 5 1 36Mpumalanga 0 1 1 2 1 0 0 0 15 1 3 1 25Limpopo 0 1 2 3 1 1 0 0 34 0 2 1 45Western cape 1 0 2 2 1 1 0 1 24 2 7 3 44Total 5 7 12 24 7 7 1 5 221 14 28 19 350

Wei

gh

ted

Sam

ple

Nat

iona

l Cen

tral

Pro

vinc

ial T

ertia

ry

Reg

iona

l

Dis

tric

t

S

peci

alis

ed-

TB

Spe

cial

ised

-Psy

chia

tric

Spe

cial

ized

-Ort

hopa

edic

priv

ate

hosp

ital

Pro

vinc

ially

Aid

ed

clin

ic

chc

Hos

pice

s

# of Facilities Selected for the study

Page 77: On Samples And Sampling

BIBLIOGRAPHY

Israel GD. (1992) Sampling the evidence of extension program impact. University of Florida IFAS Extension PEOD5. (http://edis.ifas.ufl.edu.)

Israel GD. (1992) Determining Sample Size. University of Florida IFAS Extension PEOD6 (http://edis.ifas.ufl.edu.)

Portney LG and Watkins MP. (2000). Foundations of clinical research – applications to practice. 2nd Ed. Chapter 8 - Sampling

“I have collected a poesy of another man’s roses, and nothing but the thread that binds them together is my own”