Introduction to epidemiological research

17
Introduction to epidemiological research methods Table of Contents 1 General Introduction 2 Introduction to Epidemiology 2.1 Purpose 2.2 What will we cover 2.3 Definition 2.4 Distinguishing charateristics 2.4.1 Human 2.4.2 Population based 2.4.3 Diseases 2.4.4 Distribution (object of descriptive epidemiology) 2.4.5 Determinants (object of analytical epidemiology) 3 Types of epidemiology 3.1 Descriptive epidemiology 3.2 Example of description of an outbreak from MMWR or other source 3.3 Analytical epidemiology 3.3.1 Example of a study 3.4 Typology 3.4.1 Clinical Epidemiology 3.4.2 Environmental Epidemiology 3.4.3 Genetic Epidemiology 3.4.4 Hospital Epidemiology 3.4.5 Infectious Epidemiology 3.4.6 Life course epidemiology 3.4.7 Neuro-epidemiology 3.4.8 Occupational Epidemiology 3.4.9 Paediatric epidemiology 4 Measures 4.1 Rates 4.2 Ratio 4.3 Proportion 4.4 Prevalence 4.5 Incidence 4.5.1 Cumulative incidence 4.5.2 Incidence density 4.6 Measures of association 4.6.1 Rate/Risk difference(ARR) 4.6.2 Numbers needed to treat 4.6.3 Rate ratio 4.6.4 Relative Risk 4.6.5 Odds Ratio

description

Notes on epidemiology

Transcript of Introduction to epidemiological research

Page 1: Introduction to epidemiological research

Introduction to epidemiological research methods

Table of Contents

• 1 General Introduction• 2 Introduction to Epidemiology

◦ 2.1 Purpose◦ 2.2 What will we cover◦ 2.3 Definition◦ 2.4 Distinguishing charateristics

▪ 2.4.1 Human▪ 2.4.2 Population based▪ 2.4.3 Diseases▪ 2.4.4 Distribution (object of descriptive epidemiology)▪ 2.4.5 Determinants (object of analytical epidemiology)

• 3 Types of epidemiology◦ 3.1 Descriptive epidemiology◦ 3.2 Example of description of an outbreak from MMWR or other source◦ 3.3 Analytical epidemiology

▪ 3.3.1 Example of a study◦ 3.4 Typology

▪ 3.4.1 Clinical Epidemiology▪ 3.4.2 Environmental Epidemiology▪ 3.4.3 Genetic Epidemiology▪ 3.4.4 Hospital Epidemiology▪ 3.4.5 Infectious Epidemiology▪ 3.4.6 Life course epidemiology▪ 3.4.7 Neuro-epidemiology▪ 3.4.8 Occupational Epidemiology▪ 3.4.9 Paediatric epidemiology

• 4 Measures◦ 4.1 Rates◦ 4.2 Ratio◦ 4.3 Proportion◦ 4.4 Prevalence◦ 4.5 Incidence

▪ 4.5.1 Cumulative incidence▪ 4.5.2 Incidence density

◦ 4.6 Measures of association▪ 4.6.1 Rate/Risk difference(ARR)▪ 4.6.2 Numbers needed to treat▪ 4.6.3 Rate ratio▪ 4.6.4 Relative Risk▪ 4.6.5 Odds Ratio

Page 2: Introduction to epidemiological research

▪ 4.6.6 How to establish whether an exposure is associated with anoutcome

• 5 Chance◦ 5.1 What is meant by play of chance◦ 5.2 How to control for the play of chance◦ 5.3 Bias

▪ 5.3.1 Random bias• 6 Confounding variable

◦▪ 6.1 Independently associated with exposure and outcome▪ 6.2 Will not come in the causal pathway

◦ 6.1 How to test for confounding◦ 6.2 How to control for confounding

• 7 Causality◦ 7.1 Strength of association◦ 7.2 Dose response relationship◦ 7.3 Time sequence◦ 7.4 Biological plausibility◦ 7.5 Consistency and replicability

▪ 7.5.1 Hill's Criteria• 8 Types of epidemiological studies

◦ 8.1 Intervention studies▪ 8.1.1 Randomized Controlled Trial

◦ 8.2 Observation based studies◦ 8.3 Cohort studies

▪ 8.3.1 Prospective Cohort study▪ 8.3.2 Retrospective Cohort Study▪ 8.3.3 Case control study▪ 8.3.4 Cross sectional surveys

◦ 8.4 Summary & conclusions▪ 8.4.1 Test questions▪ 8.4.2▪ 8.4.3▪ 8.4.4 Keywords▪ 8.4.5 References

1 General introduction

The purpose of this tutorial is to introduce some basic concepts of epidemiology.

2 Introduction to Epidemiology

2.1 Purpose

The purpose of this document is to present a set of notes for the HLTH 460 class to introducesome key concepts of epidemiology. In particular, we shall discuss the meaning ofepidemiology, a few key concepts, epidemiological study designs, and causation in

Page 3: Introduction to epidemiological research

epidemiology. This is not meant to be a comprehensive treatise on the subjects and allexamples are provided in R for statistical computing. If you are not familiar with R forstatistical computing, it's a good time to get yourself familar with how to read and workthrough some examples.

2.2 What will we cover

In this paper, we are going to cover some basic definitions and concepts associated withepidemiology. There are excellent texts on epidemiology that you should consult for moredetailed information. This set of outlined notes are for a general introduction only and will beused in the class.

2.3 Definition

Epidemiology is defined as the sStudy of distribution and determinants of diseases inpopulations and means to prevent them.

2.4 Distinguishing charateristics

2.4.1 Human

The most important feature that distinguishes epidemiology from other similar studies is thefocus on human diseases. While epidemiological principles can be applied to investigation ofdisease conditions in other species, human diseases are the most important factors.

2.4.2 Population based

The second characteristic that distinguishes epidemiolog from other studies is the focus onpopulation based studies as its primary focus. Now, it is common to use principles learned inepidemiology to solve problems such as clinical medicine related issues, but the main focusof epidemiology remain public health. Thus, aggregate measures such as populationattributable risks and population attributable risk percentages are perceived as key factors todesign and evaluation of public health based interventions. Epidemiology is also referred toas the key base topic for public health. It provides a quantitative and theoretical basis of muchof public health work.

2.4.3 Diseases

The third point about epidemiology is that the focus of this subject is on diseases. Now, in thecontext of epidemiology, diseases might be distinct health states as well as diseaseconditions. Therefore, states of health such as quality of life, other health states, physicalindicators of health all of these factors are included within Epidemiology.In brief, the scope of epidemiology is well defined. It deals with human diseases in thecontext of population based health, it provides a way to measure association between riskfactors, or interventions and distinct health outcomes; it also provides a way to evaluate theeffectiveness of health interventions. In addition, it provides a way to study illnesses andhealth states of individuals and populations and ways to alter or improve them. This is donein two major ways as follows:

Page 4: Introduction to epidemiological research

2.4.4 Distribution (object of descriptive epidemiology)

Distribution of diseases in population refer to the spread of disease in individuals (person,who are being affected), place (where diseases are occuring), and time ("when" diseases haveoccured or over what time period diseases have occured). Principles of epidemiology andtools that are used to describe these phenomena are also referred to as descriptiveepidemiology. We shall discuss measures of descriptive epidemiology in a subsequentsection. Thus descriptive epidemiology or distribution of diseases aim to answer the "what"(definition) of a disease, "who" factor (persons who are affected), "where" factor (the place),and the "when" factor of diseases in populations and individuals. These are expressed in theform of prevlance, incidence, and rates of occurence of diseases.

2.4.5 Determinants (object of analytical epidemiology)

In understanding determinants of diseases, epidemiologists discuss what may have caused aparticular disease or a health outcome. A disease (or health outcome) can have more than onefactor as its causes. This is the basis of component cause model and Rothman's pie, that weshall discuss. In studies related to analytical epidemiology, these causal factors are searchedfor and discussed. Thus, analytical epidemiology, or study of determinants of diseases discussthe "why" and "how" of a disease outcome or disease causation.

3 Types of epidemiology

3.1 Descriptive epidemiology

Descriptive epidemiology refers to the pratice of epidemiology where disease outcomes areexplained in details in terms of age, gender, socioeconomic status, and other variables thatprovide details in understanding the disease patterns. These descriptors provide in generalinformation about the population or individual profile in whom the disease occured, the placewhere the outcomes were concentrated, the time period over which they happened, andadditional details of their occurence.

3.2 Example of description of an outbreak from MMWR or other source

3.3 Analytical epidemiology

Analytical epidemiology refers to the practice of epidemiology that aims to identify source ofan epidemic, or identify possible causes of a disease occurence, or casual factors associatedwith a specific health outcomes. Majority of articles published in journals of epidemiologyand disease surveillance, and analysis of epidemics involve analytical epidemiology. Thefollowing example provides an instance of analytical epidemiology in action.

3.3.1 Example of a study

3.4 Typology

In addition to classifying epidemiology broadly into descriptive & analytical epidemiologywhere the focus of division is based on the specific discipline related to the objectives of the

Page 5: Introduction to epidemiological research

study. The following descriptors provide some details about specific disciplines to whereprinciples of epidemiology are commonly applied.

3.4.1 Clinical Epidemiology

Clinical epidemiology refers to the applications of principles and practice of epidemiology tospecific clinical problems, such as specific health outcomes or specific health servicesresearch related topics.

• Example of screening study

3.4.2 Environmental Epidemiology

Environmental epidemiology refers to the application of epidemiology to study/solveenvironmental health related topics.

• Example of air pollution and health studyAs can be seen in the air pollution & health study, the principles of epidemiologywere used to study health effects of criteria pollutants in the air for studying healtheffects (criteria pollutants in the air refer to the concentrations of particulate mattersthat are less than 10 microns in size), Ozone, oxides of sulphur and nitrogen, etc.These compounds are known to affect respiratory function in humans.Epidemiologica study designs to investigate the effects of these matters onrespiratory function belong to environmental epidemiology.

3.4.3 Genetic Epidemiology

Genetic epidemiology refer to the application of epidemiological study designs and principlesof epidemiology to study and address topics related to human genetics and genetic traits.

• Example of genetic epidemiology

3.4.4 Hospital Epidemiology

Hospital epidemiologists study patterns of disease occurence and health outcmes related topatients in a hospital. For example, epidemiologists who study patterns of infection in ahospital ward and identify causes or mechanisms underlying these patterns are examples ofhospital epidemiology in action.

3.4.5 Infectious Epidemiology

Infectious epidemiology include application of epidemiological principles to addressinfectious diseases. Infectious diseases are those diseases where an infectious agent, eg, virus,bacteria, or other non-human agents are responsible for causation of disease.

3.4.6 Life course epidemiology

Life course refer to the cumulative events across an individual's span of life beginning fromfetal stages till adulthood. Epidemiology applied to studying events across life span and theirrelationships with other diseases is referred to as life course epidemiology.

Page 6: Introduction to epidemiological research

3.4.7 Neuro-epidemiology

Application of epidemiology with neurology – the sciences related to brain, nervous systemsand related disorders.

3.4.8 Occupational Epidemiology

Refers to application of epidemiology to address diseases that are largely due to occupationalhazards.

3.4.9 Paediatric epidemiology

Paediatric epidemiology refers to diseases of children. Thus, epidemiological principlesapplied to growth, development, developmental milestones, other illnesses associated withchildren fall under the rubric of paediatric epidemiologyAs can be seen from the above descriptions, there may be considerable overlaps betweendifferent subdisciplines of epidemiology. For example infectious epidemiology can be nestedwithin paediatric epidemiology when one is studying paediatric infections,and so on. It'snevertheless important to keep in mind the key principles on which epidemiology is based on,and we briefly describe each of these principles in the subsequent sections. We shall drilldown into more details in the subsequent chapters that we shall discuss in this course.

4 Measures

4.1 Rates

Rate of a disease D is given as:Rate = # of cases of a disease/ Time period through which they occur Example: refer to thefollowing table: Table 1. Cases, populations, per pop, time, rate

Cases Population Cases per pop Time Rate/yr

10 1000 100. 2 50.

20 500 400. 3 133.33333

100 1200 833.33333 5 166.66667

Rates, as you can see in the above table, refer to the variation of counts over time.

4.2 Ratio

Ratio is the result when one variable is divided by another variable, but these variables arenot subsets of each other. For example, relative numbers of men and women in a sample of apopulation is expressed by male:female ratio

Page 7: Introduction to epidemiological research

4.3 Proportion

If numerator of a division is contained within the denominator, the result is a proportion. If ina specific sample of 1000 women, 100 are of the child bearing age (15-45 years), theproportion will be given by 100/1000 or 10%.

4.4 Prevalence

Prevalence refers to the extent of a disease or health outcome in a populatinon out of a basepopulation. For example, prevalence of diabetes in a given area denotes the number ofindividuals with diabetes out of the total population in the area. Usually, the base is set at10,000. Prevalence figures are used for indicating frequency of a condition or rarity of adisease. By convention, a condition is deemed as rare if the prevalence of the outcome is lessthan 1 per 10, 000 population. Prevalence of an outcome is defined in terms of whether theprevalence is measured at one point in time or over a period of time (Point prevalenceorperiod prevalence)

4.5 Incidence

Incidence of a disease is defined as the number of new cases of a disease over a specifiedtime period among people who were at risk of developing the disease but who are diseasefree at the beginning of the observation period. Incidence therefore differs from prevalence innot only providing a snapshot of the disease magnitude but also in providing an impression ofthe spread of the disease. Prevalence in turn, therefore depends incidence and duration ofdiseas. A disease that has very short duration (patients either get cured quickly or die soon)may have high incidence but typically low prevalence. On the other hand, diseases that haveprolonged duration (chronic diseases such as hypertension and diabetes) may likewise havelow incidence (ie, few cases are diagnosed in healthy individuals over a period of time) but atany given point in and over years, a large number of individuals can be found to have thedisease when snapshots or surveys are conducted.

4.5.1 Cumulative incidence

Refers to the total number of individuals with a particular disease or health outcome over aperiod of time. All individuals with the disease were susceptible to the disease but weredisease free at the beginning of the time period.

4.5.2 Incidence density

Incidence density is also known as the force of the disease and refers to the phenomenon ofthe speed at which new cases of the disease emerge in individuals who were upto that pointdid not have the disease.

4.6 Measures of association

As we just discussed, epidemiology is about studying distribution and determinants ofdiseases in populations. In the previous sections, we briefly indicated the measures of diseasedistribution in terms of prevalence and incidence of diseases. Prevalence indicates the

Page 8: Introduction to epidemiological research

magnitude of the disease in communities or likelihood of diseases; incidence indicates theforce of spread of disease and related outcomes. Beyond disease distribution, epidemiologistsalso study what may have caused diseases. In order to investigate the causes of diseases, tworelated items are finding a valid association between either an exposure and a diseaseoutcome and extending it further, stating whether this association is not only valid in terms ofstatistical and substantive significance, but over and above, whether this association indicatessomething related to cause and effect. An example might be investigation of smoking andlung cancer. It is understood that smoking cigarettes are cause of lung cancer. To establish acause & effect linkage between smoking and lung cancer, at the least it needs to bedemonstrated that there is a valid linkage between smoking and cancer that goes beyondstatistical tests. Following establishment of linkages, the nature of the association, i.e,whether this association is one of cause and effect need to be established. This is done using anumber of epidemiological study designs. However, as an introduction, let's discuss somecommon measures used in epidemiology to infer associations between exposures andoutcomes or interventions and desired outcomes.

4.6.1 Rate/Risk difference(ARR)

The first measure that might be helpful to show that an exposure is associated wth anoutcome or that an intervention is associated with the desired outcome is to calculate thedifference in the rate of their occurrence in the groups that are being compared. Thisdifference is known as rate or risk difference and is a linear measurement that denotes thedifference in the risk or likelihood or rate of occurrence of disease in individuals who areexposed versus indivduals who are not exposed (or did not receive) the intervention. In thiscontext, for intervention trials and intervention related studies, absolute risk reduction (toindicate effectiveness of an intervention in minimizing harms)is often used to denote thefollowing:ARR = Rate of disease among individuals who did not receive the intervention - rate ofdisease among individuals who did receive the intervention

4.6.2 Numbers needed to treat

The term numbers needed to treat (NNT) indicate the number of individuals who must betreated with the intervention to prevent adverse effect in one(alternatively, the number ofindividuals who must be treated to observe successful effect in one. NNT = 1/ARR

4.6.3 Rate ratio

Rate ratio indicate ratio of the rates of disease in exposed versus rate of disease in non-exposed individuals.Correspondingly, rate ratio indicate rate of disease in the treated versusof rate of disease or the desired outcome in those who did not receive the treatment.

4.6.4 Relative Risk

In the context of observational epidemiology, relative risk denotes risk of adverse outcomesfor the exposure of interest relative to control conditions and is expressed as: Relative Risk =Risk of disease due to exposure / Risk of disease due to alternative conditions underobservation Usually, in the context of prospective epidemiological studies such as cohort

Page 9: Introduction to epidemiological research

studies (prospective or retrospective cohort studies), relative risk is calculated as incidence ofa disease for exposed individuals divided by incidence rate of diseases in individuals whowere not exposed to the suspected exposure. A relative risk of one (RR = 1) indicates that therisk of disease among exposed is higher than those who are non-exposed and thus helps toestablish the variable as risk factor. Similarly, a relative risk that is less than one indicatesthat the risk of disease is less among exposed compared to those who are non-exposed andtherefore indicates a protective effect.

4.6.5 Odds Ratio

Odds ratio is another measure of association where Odds (or likelihoods represented byprobability or prevalence of a disease divided by prevalence or probability of non-disease orlikelihood of an exposure is calculated). Odds Ratio is usually calculated as measures ofassociation for case control studies, where chances or likelihoods of exposures are calculatedfor those individuals with and without the disease outcome of interest. As in the case ofrelative risk estimates in prospctive studies (or studies that mimic prospective studies as inretrospective cohort studies), an odds ratio of more than one indicates that the probable orevaluated risk factor is positively associated with the disease and likewise, an odds ratio ofless than one indicates protective effect of the exposure variable indicating that the likelihoodof association of the suspected variable with the outcomes is lower for those with diseases asopposed to those with no diseases. Odds ratio is typically used in the context of case controlstudies.

4.6.6 How to establish whether an exposure is associated with an outcome

In analytical epidemiology and in case of analysis of epidemiological studies, as has beenalready discussed two issues are important -- whether an association is valid (statistically andsubstantively) and whether a valid association can be deemed as one related to cause andeffect. Whether an association is valid depends on the extent to which alternativeexplanations that might account for the observed association can be ruled out; however,unlike establishment of claims of valid association, the role of causality are less clear cut andare dependent on numerous issues, some of which will be discussed in the following sections.

5 Chance

5.1 What is meant by play of chance

In finding alternative explanations, the first of the three possible alternative explanations is,this observed association may have arisen entirely because of chance. In other words, thisassociation may well be a random association. Let's take an example. Let's say a group ofresearchers decided to study whether chocolate consumption was associated with fatal strokesand found that those who regularly consumed chocolates were less likely to suffer fromhaemorrhagic stroke. The obvious question that is addressed by ruling out the play of chanceis, what is the likelihood that this observation may have been a random association. In otherwords, is it possible that actually, there is equal likelihood of stroke among indivdiuals whoregularly consume chocolates versus those who do not?

Page 10: Introduction to epidemiological research

5.2 How to control for the play of chance

Let's take the above example and expand on it a little. Let's say (imagine, we shall see in alater example about such a study and actual findings), the team of researchers studied 500individuals with an without stroke and surveyed their lifetime chocolate consumption andfound that compared to those who did not have stroke, those who had stroke were about 30%less likely to have consumed regular (ie 3 times a week or more) chocolates forlong time.Let's say the OR was estimated to be about 0.70. If the researchers were to control for theplay of chance, they would set up a pair of hypotheses. Hypotheses are statements abouttruths or about certain possibilities that are similar to theorizing. In this situation, there aretwo possibilities, either that:

1. There is no association between chocolate consumption and stroke and therefore theOR should actually be = 1, OR

2. There is a protective association between chocolate consumption and strokeoccurrence and therefore OR < 1 (we do not worry at this point as to what mayactually be the OR)

The above statements express what is known as "one way hypothesis". Alternatively, onecould feign complete ignorance and claim that although we do not know what may actuallybe the relationship between chocolate consumption and stroke, we believe that it's not goingto be unit, so we'd state that the OR < 1; it could also be a risk factor and in that case the ORmight be > 1. Either way, the OR != 1 (not equal to one). Such a position would be explainedby what is known as two way hypothesis.As the case may be, let us now try to resolve the situation. Note the following table:oTable explaining hypothesis testing

Position Null True Null False

Reject Null Type I error No Error

Fail to reject null No Error Type II Error

The above table needs a few words of explanation. Let's start with the first row, secondcolumn. You see the statement, "Null True". What does that mean? Continuing on the presentexample of chocolate consumption and risk of stroke, let's say we make two statements:

1. The risks of stroke are equal for those with or without chocolate consumption, so OR= 1

2. The risk of stroke are lower for those with chocolate consumption, in other words,OR < 1

It turns out that the first statement is termed as null hypothesis, and statement two is termedas alternative hypothesis. Think of null hypothesis as keeping the status quo or a position thatis currently challenged by the research. As can be seen from the table, in reality, there can beone of the two positions, the Null statement is actually true or false. However, the studyfindings may end up either disagreeing with the null hypothesis (when the position is that thenull is rejected), or that the null cannot be rejected since findings of chocolate eatingbehaviours between those who had & those who did not have stroke was similar. As you cansee there are two situation where the researcher(s) are correct, by correctly rejecting the nullor failing to reject the null. There are other two situations where the researchers can makeerrors. These are as follows.

1. The researchers can falsely reject null. This is the scenario where a new finding isactualy wrong (because in reality null hypothesis is the correct one)

Page 11: Introduction to epidemiological research

2. Alternatively, the new finding found in favour of the null hypothesis but in actuality,the null hypothesis was to be rejected (it actually favours the status quo)

By convention, it is stated that the role of chance in finding an association can be rejected ifthe probability of committing type I error is five percent or less. This conventional value of5% probability or 0.05 is known as p-value.While p-values are one way of expressing how the role of chance can be controlled for, theother approach for expressing the play of chance is by citing what isknown as a confidenceinterval band around the effect size estimate. Let us illustrate the situation once again with thecurrent example. Let's say our researchers identified that the Odds Ratio of associationbetween chocolate consumption and occurence of stroke was 0.70 (95% confidence interval:0.64-0.87). What thsis indicates under this situation is that, while the risk of stroke amongregular chocolate consumers is about 0.70 (protective), if this study were to be done severaltimes over (possibly infinite number of times), the true measure of association might liesomewhere between 0.64 and 0.87; in other words, while the exact extent of protection fromstroke cannot be ascertained, one can be "confident" that the true measure of association willlie in an interval spanning 0.64 and 0.87. How is the measure of "confidence" derived? Well,it turns out (the law of large numbers) that if one can construct a large series of numberscorresponding to a phenomenon, the actual values may range from very low to very high, andmost values will fall somewhere in the middle of this range (the central zone that contains theaverage, or the mean, and a range where most of the values lie). In studies such as those thataim to measure relative risks and odds ratios, these values span a Gaussian distribution (ornormal distribution). In such a situation, the measure of spread is determined by the standarddeviation (the square root of variance). In a normally distributed set of values, 95% of thespread of values are said to lie within 1.96 times of standard deviation about the mean. Ascan be inferred from the above description,

1. If the lower and the upper margins of the confidence interval are both lower than 1.0,then not only the association is one of protective but one can state with some degreeof precision the range within which the true measure of association lies.

2. If the figure 1.0 come in between lower and upper margins of confidence intervals,then, the true measure of association is uncertain and one cannot state withconfidence that the association is one of protective or risk

3. If both the upper and lower margins of measure are above 1.0, then the nature of theassociation is one of increased risk of the outcome

4. The lower the confidence (say 90% confidence as opposed to 95%), the narrowerwill be the interval. In a normal distribution, about 68% of the values about the meanlie within 1 SD; 90% of values lie within 1.65 times about the mean, while 1.95times SD span 95% of the interval.

5. The fewer the number of individuals on which the study was done (the lower thesample size, an issue that weshal discuss later in the course), the wider will be theconfidence interval. Correspondingly, if a large number of individuals are included inthe study, the confidence interval will be very narrow

A fuller discussion of the issues of p-values and confidence interval is beyond the scope ofthis introductory paper (and we shall discuss these issues later in the course); however, for thesake of brevity and discussion, it need to be pointed out that compared to p-values, 95%confidence intervals are more robust measures of association and indicate both the magnitudeand spread of the estimated values of association within which the true measure ofassociation lies.

Page 12: Introduction to epidemiological research

5.3 Bias

The second of the three major alternative explanantions for an observed association is that,there may have been systematic errors in the way the researchers may have recorded ormeasured the exposure, or allocated the alternative interventions (true intervention and theplacebo), or outcomes, or both exposure/interventions and outcomes. These systematic errorsin the allocation or observations are known as bias. Essentially, one can categorize twodifferent types of biases – random bias and non-random bias.

5.3.1 Random bias

When the errors in observation on either the exposure or allocation of intervention, or theoutcomes are similar for the comparable groups, nothing can be said about whether errors inone group is over representedor different from another. The errors in this situation, ar randomerrors.Let's continue with our already discussed example of chocolate eating and strokes. In thisstudy, the researchers identified individuals who had stroke and those with other conditionsthan strokes. Further, they measured using a food frequency questionnaire to measure theextent and frequency of chocolate eating for each individual enrolled in the study. What maybe the sources of bias?

1. With prior assumptions, the researchers could have asked questions differently tothose with and without history of strokes about their chocolate consumptions

2. After individuals were briefed about the need and purpose of the study, it waspossible that those who had an attack of stroke might have responded differently tothe questions than those who did not have strokes and these responses would differbased on who they were and may not entirely be determined by the underlyingbiological reality

3. Both 1 and 2 could be possible4. The researchers could have selected different groups of individuals depending on

whether they had strokes or whether they did not have strokes.Now, if these errors of either observations or reporting or diagnosis were similar for both thegroups (ie those who had or did not have strokes, or those who were high chocolateconsumers versus those who were not), then at worse, the measures of association (in thiscase measured by Odds Ratio) would point more towards null value (ie closer to 1.0). Thistype of bias is known as Random Bias, and as you may guess, random bias shifts the extent ofassociation towards null and therefore with random bias (also known as non-differentialmisclassification bias), towards null. However, in cases where the errors are reporteddifferently between those with and without the outcomes, the errors are non-random orsystematic. Consequently, it is uncertain as to which direction would the measure ofassociation move (it could move towards the null or away from the null, but the shift isunpredictable).

6 Confounding variable

As can be seen in the accompanying figure, a confounding variable is one which isindependently associated with both the exposure variable and the outcome variable but doesnot come in the causal pathway. Example. Let's imagine a group of researchers are planning

Page 13: Introduction to epidemiological research

to investigate smoking might be a risk factor for early onset of heart attacks (acutemyocardial infarction). It is known that in this population, men are more likely to be smokers,and men also have higher risk of heart attacks. As a result, gender in this instance, is aconfounding factor for heart attacks in this population. Thus, there are two criteria that needto be mentioned in this connection:

6.1 Independently associated with exposure and outcome

6.2 Will not come in the causal pathway

6.1 How to test for confounding

The presence of confounding variables can be tested by initially calculating the crude effectsand then teasing out the individual effects separately, and then weighting the results ofindividual levels of potential confounding variables to arrive at a summary estimate. Inaddition to statistical and numerical methods, the presence of confounding can also beadjusted for from a substantive perspective, ie, based on prior knowledge.

6.2 How to control for confounding

There are three ways to control for the effects of confounding variables. These arerandomization, matching, and multivariate analysis. Randomization is the process where it isbelieved that random allocation of individuals to intervention and control groups would resultin even distribution of all other variables that might potentially account for independentassociation & therefore account for confounding. Matching, as the name implies, results inpairing up individuals based on levels of suspected confounding variables. Thus, if gender issuspected to be a confounding variable, then equal number of men & women may beincluded in the study to match for the effect of gender. Multivariate analysis is usually doneduring the data analysis phase where in a multivariable model, one or more variables that aresuspected to be confounding variables are added in a stepwise manner to control for the effectof confounding variable. This is also used to test whether a particular variable is acting as aconfounding variable or not. Thus, in summary, randomization, matching, and multivariateanalysis are three strategies commonly used for controlling the effect of confoundingvariables.

7 Causality

The second issue around epidemiological studies relate to how study findings support cause& effect relationships. Unlike statistical associations, cause & effect relationships are morecomplex. They are defined on the basis of a number of criteria that define a logicalframework. In the late 1940s, the British epidemiologist Sir Austin Bradford Hill discussed afew "considerations" that are now widely used for cause-effect deduction. Briefly in thatframework, the main consideratins are:

7.1 Strength of association

Denotes how strongly a suspected exposure variable and its associated outcome are related

Page 14: Introduction to epidemiological research

7.2 Dose response relationship

If the extent of outcome demonstrates a corresponding increase with increased levels of thesuspected exposure, then that might show evidence in favour of cause-effect association

7.3 Time sequence

This is an irrefutable clause as it states that the exposure must precede the outcome in time

7.4 Biological plausibility

This clause can be challenged but in essence, it states that if X is deemed as a cause of Y,then that association should be explained on the basis of some known biological mechanismsof association.

7.5 Consistency and replicability

This indicates that studies exploring association between the exposure and outcomesconducted under different circumstances should find similar associations.These criteria are listed below:

7.5.1 Hill's Criteria

• Strength of association• Dose-response relationship• Temporality• Biological plausibility• Consistency• Replicability

In summary therefore, description of disease/health outcomes, specifying associations withexposure variables, and establishment of cause and effect associations are the three purposesof epidemiological studies. We shall go through a brief description of each type ofepidemiological studies in the next section.

8 Types of epidemiological studies

Essentially, there are two types of epidemiological studies – studies that describe some typeof interventions to address specific disease conditions, or studies that are primarilyobservational in nature. In observational epidemiological studies, associations betweenspecific exposure variables and outcome variables are calculated and presented.

8.1 Intervention studies

8.1.1 Randomized Controlled Trial

Randomized Controlled Trials (RCT) are classes of intervention studies where individuals areclassified into two or more groups based on their allocation of interventions. The competinginterventions are interventions under investigation, another alternative form of intervention,

Page 15: Introduction to epidemiological research

or no intervention at all. The alternative form of intervention is often known as placebo whichis essentially an inert or dummy drug/device. After individuals in the intervention and thecontrol groups are allocated to alternative treatments, they are followed up prospectively tostudy whether they develop the outcomes of interest. The rate of development of theoutcomes are compared. The results are expressed in the form of relative risks. Implicationsof the study are studied by calculation of absolute risk reduction (difference in risks) andnumbers needed to treat (NNT = 1/ARR).

• Advantagesof conducting a randomized controlled trial are:◦ Interevention and control groups enable direct comparison◦ Adequate numbers can be selected to control for chance◦ Randomization can account for confounding variables◦ Prospective study design can take care of time as a variable for cause and

effect◦ Concept of efficacy

• Disadvantages◦ Expensive◦ Limited to specific population studied

8.2 Observation based studies

Some epidemiological studies are essentially observational in nature, meaning that theobjectives of this class of studies is to report the observed association between an exposurevariable and an outocme. We describe four types of studies here – cohort studies, case controlstudies, cross sectional surveys (also called prevalence studies), and case series/case studies.

8.3 Cohort studies

Cohort studies are epidemiological studies where groups of individuals (referred to ascohorts) are selected on the basis of some pre-specified "exposures" and then are followed intime to observe emergence of specific diseases. For example, about 50000 women who werein their menopause and were using hormone replacement therapy were followed up in Francebetween 1990 through to 2002 to observe different health outcomes. The study results werepublished recently. Essentially, there are two types of cohort studies -- prospective andretrospective cohort studies.

8.3.1 Prospective Cohort study

A prospective cohort study design is one where the subjects are followed up in time from thepresent to some point in time. At the time of commencement of the study, the participants areso selected that they are exposed or not exposed to the exposure variable of interest but noneof them have any disease outcome of interest.

8.3.2 Retrospective Cohort Study

A retrospective cohort study, as the name suggests, is one where the exposure ascertainment(or allocation of exposure) and the outcome have all taken place in the past, but theseindividuals can still be studied as if they are followed in time and the health effects can be

Page 16: Introduction to epidemiological research

analyzed. The above example of post menopausal women followed up in France shows how aretrospective cohort study is organized.For both prospective and retrospective cohort studies, the measure of association is termed asrelative risk. Relative risk is calculated as the incidence of disease or health outcomes amongthe exposed versus those who are non-exposed.

8.3.3 Case control study

A case control study design is one whose objective is to identify if a particular risk factormay be associated with a specific disease outcome. To conduct the study, the researchersbegin with individuals with and without the disease under study and then based on thepresence or absence of the disease, the risk factor associations are worked out. A special classof case control studies is termed as nested case control study. In the nested case control study,a case control study design is set up within the context of a prospective cohort study. Usually,in the beginning of planning a cohort study, body fluid or other physiological parameterrelaed information are collected and maintained. Then, when a sufficient number ofindividuals have developed the outcome of interest, a case control study is set up to study theassociation between the disease outcomes of interest and the exposure variables collected atthe commencement of the study. The measure of association in a case control study is termedas an Odds Ratio. Essentially, an Odds Ratio indicates the comparison of Odds of anexposure known that a particular outcome is present. Odds, as we already described,represents the ratio of two probabilities, probability of an event happening and the probabilityof an event not happening (also known as complementary event). In case control studydesigns, such Odds are defined as follows:

1. Odds of exposure in cases2. Odds of exposure in controls3. Comparison of Odds

Cases are individuals with the disease; controls are individuals who are similar to cases butwho do not have the disease or health outcome under study. For example, if the hormone-asthma study were to be designed as a case control study design, one would have thought ofusing women with asthma and those without asthma and then work backwards to identifywhich one of them were using specific types of HRT to make comparisons. The followingtable outlines the situation:Table outlining the situation of case control study design

Exposure Cases Controls Total

Exposure positive A B A+B

Exposure negative C D C+D

Total A+C B+D A+B+C+D

If we analyze the figures contained in the above table, let's calculate the odds of exposure forcases and controls:

1. Probability of exposure among cases = A/A+C2. Probability of non-exposure among cases = C/A+C3. Therefore, Odds of exopsure among cases = A/(A+C) / C/(A+C)= A/C4. Likewise, Odds ofexposure among controls = B/D5. Hence,Odds Ratio = (A/C) / (B/D) = AxD/BxC

Page 17: Introduction to epidemiological research

8.3.4 Cross sectional surveys

Cross sectional surveys are epidemiological study design where individuals are not "followedup" in any direction but all information about possible exposures and health outcomes areobtained at the same point in time. If you think of time as an arrow that runs from left to right(left being past and right being future), then cross-sectional survey indicates a line that cutsacross that arrow at any specific point. All information collected at that specific pointconstitutes the basis of a cross sectional study. The study in most circumstances areconducted in the form of questionnaire and other measurement based surveys. The effectmeasure of choice in cross sectional surveys are essentially prevalence. Prevalence, as wediscussed before indicate the magnitude of a disease or health outcome present in acommunity at a given point in time. When prevalence is treated as probability measure andOdds Ratios are calculated on the basis of prevalence measures, this metric is known asPrevalence Odds Ratio and therefore prevalence Odds Ratio is a special type of metric usedin conjunction with prevalence or cross sectional epidemiological studies.

8.4 Summary & conclusions

This module provided a rapid review of some of the key concepts about epidemiology ingeneral. More detailed discussions will be done with specific topics as we study them infuture.

8.4.1 Test questions

8.4.2

***

8.4.3

8.4.4 Keywords

8.4.5 References

Author: Arin Basu <[email protected]>Date: February, 2010HTML generated by org-mode 6.34c in emacs 22