XNN001 Introductory epidemiological concepts - sampling, bias and error

INTRODUCTORY EPIDEMIOLOGICAL CONCEPTS – SAMPLING, BIAS AND ERROR

XNN001 Population nutrition and physical activity assessment

Study designResearch question

Study design Target population

Sampling frame

Sampling selection

Data collection

tools

Data collection methods

Generalisability of findings

Sampling – what is it?

Selection of a smaller number of units from a larger group

In research aim to enable generalisation to a target population

Why do we sample?

1. Not possible to study ALL people in a population

2. Feasible and realistic financially to study smaller subset of a population

3. Unethical if sample is larger than necessary (overpowered)

Aim to provide an accurate representation of the target population

Allows for generalisation from sample to broader population

Need to minimise sampling error and bias

EXTRAPOLATE or MAKE INFERENCES

How big should a sample be?

Sample size calculations to determine required size Based on variables to be measured - expected

difference, expected response rates, cluster effect, attrition etc

Small sample size Less likely that sample is representative of target

population Limited POWER to detect ‘effect’

Larger sample size More likely that sample is representative of

target population Increased POWER to detect ‘effect’

Sampling methodology

Probability sampling1. simple

random2. systematic3. stratified 4. cluster5. multi-stage

Non-probability1. convenience2. quota3. purposive4. snowball

1. Simple Random Sampling

Subset of individuals chosen from a list of individuals from the broader population (sampling frame)

Each individual chosen at random

all subjects have equal chance of being selected Most likely to achieve sample representative of

population (least selection bias) May be difficult to achieve in practice Not ideal for special interest groups/ population

minorities

Simple Random Sampling

2. Systematic sampling

Units sampled at regular intervals

Width of intervals randomly determined inadequate sampling of rare individuals

who may be of interest chance that random dispersion is

“unlucky” and inadequate Researcher must ensure sampling does not

hide pattern

Systematic sampling - example

3. Stratified sampling

Population divided into subgroups prior to sampling

To ensure adequate numbers of subjects from subgroups are included e.g. male and female subgroups Then simple random sample the individuals

among male group and then female group

Target population – Brisbane households

Sampling frame – electoral roll

Sampling frame – electoral roll MALES

Sampling frame – electoral roll

FEMALES

SAMPLE

Random sampleRandom sample

4. Cluster sampling

Total population broken down into ‘groups’ or ‘clusters’

Number of clusters then randomly selected from all eligible clusters All individuals in each selected cluster

become potential subjects.

4. Cluster sampling

One-stage cluster sampling Clusters are selected randomly All individuals within clusters are invited to

participate in the study

Two-stage cluster sampling Clusters are selected randomly Lists of all elements within clusters are

obtained - random samples drawn from lists

Cluster sampling - example

All Schools in Brisbane

School A – all students

Random sample

School B – all students

Random sample

Simple Random Sampling

Stage 1

Stage 2

5. multi-stage sampling

Complex form of cluster sampling Population divided into clusters and sub-

clusters

Used when selecting from very large population

Nationwide retail chain

random selection of region

random selection

Stratified sampling

Region 1

Region 2

20

random selection of stores

Store 1 Store 2 Store 1 Store 2

Male

Female

Male

Female

Male

Female

Male

Female

20 20 20 20 20 20 20

Non-probability sampling

Sampling techniques that do not rely on random selection

When sampling frame not able to be identified e.g. visitors to a particular internet site

When sampling populations are difficult to access (e.g. drug users, street based sex workers).

When very strict inclusion and exclusion criteria are necessary (e.g. in pharmaceutical drug testing)

1. Convenience sampling

Units ‘selected’ based on ease of access Volunteers

Shoppers in a supermarket Respondents to advertisements Clinic attendees

The sample usually is different from the target population Cannot generalise results to general

population

2. Quota sample

Population divided into defined subgroups e.g. males; females

Proportions of subgroups in population identified

Convenience sample of each subgroup to make up required numbers

3. Purposive sample

Deliberate selection of individuals by researchers based on a predefined criteria - INCLUSION & EXCULSION CRITERIA Often used in pharmaceutical drug testing Also called judgmental sampling

4. Snowball sampling

Involves asking subjects to provide names of others who may meet study criteria Useful for sampling populations difficult to

access Also called networking

drug users street-based sex workers underground networks

Snowball sampling

Measurement issues

Error- validity when an estimate (eg, incidence, prevalence, mortality)

or association (RR, OR) deviates from ‘true’ situation in nature

May be introduced at any point during the study: Study design (quality) sampling Measurement Analysis

Random error

Systematic bias

Random error Fluctuations around a true value Related to poor precision Sources

individual biological variation (always present)

sampling variation measurement variation (protocols and training)

Reduced by: larger sample sizes standard protocols and equipment

Systematic bias Any systematic error in the design,

conduct or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of a disease

Due to causes other than random error Problem of validity

internal and/or external validity

I. Selection bias

Arises when different criteria are used so the study population does not represent the population of interest

for example:1. Referral Bias (Berkson’s Bias) 2. Surveillance Bias3. Prevalence-Incidence Bias (Neyman’s

Bias) 4. Response Bias

Attrition Bias Participation Bias

Types of biasReferral bias Occurs in case-control studies conducted in hospitals Causes a spurious association between the exposure and

the disease, because of the different probabilities of admission to a hospital for those with/without a disease (or with/without the exposure)

Surveillance bias For example:

When conducting a case-control study to examine the relationship between oral contraceptive (OC) use and diabetes

Women taking OCs are likely to have more Dr visits, so diabetes is more likely to be diagnosed in OC users than in non-OC users

3. Prevalence-incidence bias

Also known as Neyman’s bias Usually occurs when prevalent cases are

used to investigate a disease-exposure association

Prevalent cases represent survivors, who may be atypical with respect to exposure status

Once a person is diagnosed with the disease, they may change their exposure

Types of bias

Participation biasPeople who participate in research studies are

often different to those who do not take part. Demographic, socioeconomic, cultural, lifestyle,

and medical characteristics Self-selection bias (individual consent is essential in

research, except public available information)

Attrition bias Occurs when study participants withdraw before

the study is completed and is often differential

II. Information bias Arises when inaccurate measurement

or misclassification of study variables occurs

Can affect exposure or outcome (or even confounders)

Extent of bias depends on the particular variable whether non-differential or differential

misclassification

Non-differential info-bias

Error in measurement does not vary according to other variables (cases vs controls; exposed vs unexposed)

Underestimate of the true association

Any association that is observed is likely to be true

Differential info-bias

Systematic error (ie non-random)

May over-estimate or under-estimate the actual association, depending upon the situation.

Types of information bias

1. Recall Bias cases and controls recall their exposures differently It is human being’s nature to looking for reasons if

something went wrong “If you seek, you will find.”

2. Detection Bias the exposed group is monitored more closely

3. Interviewer/observer Bias Not blinded Not properly trained

Types of information bias

4. Reporting Bias “Objectively”

Cases tend to have better information Individuals who are part of a study may

behave differently (Hawthorne effect)

“Subjectively” Reluctant to report: attitudes, beliefs,

perception Wish bias: subjects attempting to answer the

question of “why me?” and the disease is not their fault (lifestyle), but others (work related exposure)

III. Confounding - definition

An association between a given exposure and outcome is influenced by a third variable – confounding factor.

To be a confounder:1. Be a risk factor for disease2. Be associated with the exposure3. Not a result of the exposure

Not be an intermediate between exposure and the outcome (i.e must not lie on the causal pathway)

Validity

Do the study conclusions reflect the true value/relationship?

External validity (generalisability): can the findings be generalised to other similar samples or the population-at-large?

Internal validity: are the results correct for the particular group you have studied?

Reliability Accuracy -- how close to the true population

value is your measurement value? Assess accuracy by comparing to “gold

standard”

Precision -- If you repeat your measurement/ sample selection/analysis on numerous occasions, will you get consistent results? Assess precision by inter-observer and intra-

observer comparisons

XNN001 Introductory epidemiological concepts - sampling, bias and error

Education

Transcript of XNN001 Introductory epidemiological concepts - sampling, bias and error