Alaa Elmaoued Nancy Nguyen - Department of Pathology · Years of potential life lost (YPLL) Measure...

Alaa Elmaoued

Nancy Nguyen

Epidemiology/population health

Incidence vs. prevalence

Measures of health status

Survival analysis interpretation

Composite health status indicators

Population pyramids and impact of demographic changes

Disease surveillance and outbreak investigation

Communicable disease transmission

Points of intervention

Study design, types and selection of studies

Descriptive studies

Analytical studies: observational vs. interventional

Systematic reviews and meta-analysis

Obtaining and describing samples

Methods to handle noncompliance

Qualitative analysis

Study interpretation

Bias, confounding, and threats to validity

Internal vs. external validity

Statistical vs. clinical significance

www.usmle.org

Rates: crude and adjusted

Crude = overall (e.g. crude mortality rate)

Adjusted = stratified by different categories (e.g. Age-adjusted mortality rates)

Mortality

Standard mortality ratio = (observed # of deaths per yr/expected # deaths per yr) x 100

If the SMR = 100, this indicates that the # observed deaths is equal to # expected

Population attributable risk (PAR) = Incidence in the total population – incidence in the nonexposed group

Population attributable risk percent (PAR%) = [(incidence in the total population-incidence in the nonexposed group)/incidence in the total population] x 100

Reproductive rates

Maternal mortality

death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of the pregnancy, from any cause related to or aggravated by the pregnancy or its management but not from accidental or incidental causes

Denominator is usually reported per 100,000 registered live births

Neonatal mortality

Death of a live-born baby within 7 days of life

Per 1,000 live births

Infant mortality

Death of a child less than 1 year of age

Per 1,000 live births

Under-5 mortality

NOT THAT

Y-axis represents the proportion of survivors and X-axis represents time moving forward

Generally used to assess survival with death as the defining “event” but can also be used for other health outcomes such as fertility

Data is used to define the intervals rather than having a predetermined interval

Makes full use of the data and is more accurate

Accounts for some loss to follow-up

Years of potential life lost (YPLL)

Measure of premature mortality or early death (i.e. people who die younger have a greater loss of future productive years than people who die at an older age)

Based on life expectancy of the population

Quality-adjusted life years (QALY)

Measure of the quality of remaining life years

Used to evaluate different healthcare interventions

Quality of life is based on a scale from 0 to 1 where 0 is death and 1 is the best possible health state

Disability-adjusted life years (DALY)

Years of life lost to premature death AND years lived with a disability of specified severity and duration

Measure of overall disease burden that combines mortality and morbidity

1. Define the outbreak and validate the existence of an outbreak

2. Examine the distribution of cases by time and place

3. Look for combinations (interactions) of relevant variables

4. Develop hypotheses based on: existing knowledge (if any), analogy to diseases of known etiology, findings from investigation of the outbreak

5. Test hypotheses

6. Recommend control measures

7. Prepare a written report of the investigation and the findings

8. Communicate findings to those involved in policy development and implementation and to the public

Attack rate = # of people at risk in whom a certain illness develops/ total # of people at risk

Herd immunity

Reportable diseases: What types of diseases are reportable/notifiable?

Definition Example

Primary Preventing the initial

development of a disease

Immunization

Secondary Early detection of existing

disease to reduce severity and

complications

Screening for cancer

Tertiary Reducing the impact of the

disease

Rehabilitation for stroke

The physical examination records of the entire incoming freshman class of 1935 at the University of Minnesota were examined in 1977 to see if their recorded height and weight at the time of admission to the university was related to the development of coronary heart disease by 1986. This is an example of:

A. A cross-sectional study

B. A case-control study

C. A concurrent cohort study

D. A retrospective cohort study

E. An experimental study

Residents of three villages with three different types of water supply were asked to participate in a survey to identify cholera carriers. Because several cholera deaths had occurred recently, virtually everyone present at the time underwent examination. The proportion of residents in each village who were carriers was computed and compared. What is the proper classification for this study?

A. Cross-sectional study

B. Case-control study

C. Concurrent cohort study

D. Nonconcurrent cohort study

E. Experimental study

A case control study is characterized by all of the following except:

A. It is relatively inexpensive compared with most other epidemiologic study designs

B. Patients with the disease (cases) are compared with persons without the disease (controls)

C. Incidence rates may be computed directly

D. Assessment of past exposure may be biased

E. Definition of cases may be difficult

Cross-sectional study Case-series/Case-report

AKA prevalence study

Both exposure and disease outcome are determined simultaneously

Cannot establish temporal relationship between the exposure and onset of disease

Case report = one person

Case series = more than one

Evaluates subjects with known exposure with similar treatment OR for exposure and outcome simultaneously

No hypothesis testing

Vulnerable to selection bias (select certain patients)

No control/comparison group = low internal validity

Vulnerable to Hawthorne effect

Ecological study

• Based on aggregate or group data, not on

individual (e.g. cause of death in

different countries)

Selection of subjects is based on exposure

Groups are followed to compare incidence of disease or other health outcomes

Prospective aka concurrent aka longitudinal cohort study

Retrospective aka nonconcurrent aka historical cohort study

Good for evaluating temporal/causal association

Bad for rare diseases

Expensive and time-consuming

Problems with loss-to-follow-up

Selection of subjects is based on disease or other health outcome

Groups are evaluated to compare past exposure

Incident > prevalent cases (survival vs. development)

Matching

Group = frequency match

Individual = each case matched to a control

Relatively inexpensive and does not require as much time

Susceptible to recall bias

Good for rare diseases

Bad for rare exposures

Randomized Control Trial

Essentially the Gold Standard

Unethical in a lot of cases!

Double-blind

Placebo-controlled

Community intervention

Systematic Review Meta-analysis

A research study which aims to provide an exhaustive summary of current literature relevant to a research question.

Crucial to EBM

A statistical technique used to combine the results of all eligible studies in a systematic review into a single quantitative estimate or summary effect size

Effect sizes measure the strength of the relationship between two variables, thereby providing

information about the magnitude of the intervention effect

Heterogeneity is a value calculated to determine if individual studies are similar enough to compare

(prefer non-significant findings for heterogeneity)

Publication bias is particularly problematic for systematic reviews because not all studies are

published, depending on the significance and direction of effects detected.

Horizontal line =

confidence interval

Center line = 1.0 (no association)

Overall result from the

meta-analysis

Each square represents the result from

individual studies

Any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of disease

Selection bias

Error introduced when the study population does not represent the target population

Can be introduced at any stage of a research study

Information bias

Occurs during data collection and can lead to misclassification

Sampling bias or non-random sampling bias: a selection procedure that yields a non-representative sample in which a parameter estimate differs from the existing in the target population

Example is telephone random sampling which would systematically exclude households without telephones

Ascertainment bias

Healthcare access bias

Survivor treatment selection bias

Recall bias

If the presence of disease influences the perception of its causes or the search for exposure to the putative cause

Common in case-control studies where participants are aware of their disease status, but can also occur in cohort studies

Ecologic fallacy

When analyses realized in an ecological group analysis are used to make inferences at the individual level

Hawthorne effect

When individuals modify they react or behave in response to their awareness of being observed

An extraneous variable that correlates (directly or inversely) with both the dependent variable and the independent variable

Example: Drinking coffee and pancreatic cancer

Confounding is not an error in the study but can be considered a true phenomenon that is identified in a study and must be understood

One approach is to stratify…

If you stratify the data by the confounding variable then you will find that the measure of association will equal 1.0

If you know of a possible confounder during the design phase of your study, you can match cases to controls based on the confounding variable

Internal validity

The extent to which a study is able to make causal conclusions based the design and ability to reduce systematic error

Essentially how well you designed your study (confounding = red flag!)

External validity

Whether the findings of a study can be generalized to the rest of the population

Example: hospital cohorts

Alaa Elmaoued

Nancy Nguyen

Sensitivity and Specificity

Positive and Negative Predictive Values

Incidence and Prevalence

Odds Ratio

Relative Risk

Attributable Risk

Relative Risk Reduction

Absolute Risk Reduction

Number Needed to Treat

Number Needed to Harm

t-Test

ANOVA

Chi-square

Pearson Correlation Coefficient

Error types

Incidence RATE = Number of new cases / Population at risk

Incidence looks at new cases at a time period

Prevalence = Number of total existing cases / Population at risk

Prevalence = incidence x duration of disease

Chronic disease with long duration has a high prevalence

Disease with short duration has low prevalence and equals the incidence of disease

Smithville has a stable population of 100,000 and 2000 individuals in this community have been diagnosed with disease X. Although 300 individuals in Smithville die each year from all causes, 100 of those die from disease X. There are 50 new cases of the disease each year.

The annual incidence of this disease is represented by which of the following?

The incidence is represented by the number of new cases of the disease in a given

period divided by the susceptible population. Because the 2000 people with the

disease are no longer susceptible, they must e subtracted from the total population;

thus the incidence is 50/98,000.

A research group is studying sickle cell disease in a geographically isolated community of

6000 people. A genetic analysis is performed on every community member At the beginning

of the year, it is determined that 10% are homozygous for hemoglobin S and therefore have

sickle cell disease, and 30% of the community is heterozygous for the mutant allele. Over the

course of the year, 100 infants are born, six of whom are diagnosed with sickle cell disease. Of

80 people who die during the year, three had sickle cell disease.

Which of the following is the current prevalence of sickle cell disease in this population?

Prevalence is the total number of cases in a population divided by the total population at risk of

the disease. Multiply the initial population (6000) by the initial prevalence (10%), yielding 600

cases. Over the course of the year, there was a net gain of 3 patients with sickle cell disease,

bringing the new total to 603. Likewise, the new population at risk is 6020, a net gain of 20 people.

Therefore, the current prevalence is 603/6020.

Be Sensitive to Positive people

Sensitivity is how good a test will identify those who have the disease

Sensitivity = True Positives/(True Positives + False Negatives) OR = 1 – false-negative rate

SN-N-OUT

A highly sensitive test Rules Out the disease if it is negative

β-Thalassemia major results from a homozygous genotype that leads to complete absence of both the β-globin

chains. A study subjected 100,000 participants to an intrauterine screening test; 87 tested positive for β-

thalassemia major, and the remaining 99,913 tested negative. In 7 of those 87 cases the results were shown to be

false positive. Ultimately, 100 of those originally screened were found to actually have the disease.

Which of the following is the correct sensitivity of the intrauterine screening test?

Proportion of positive test results tat are truly positive

If the test result is positive in this patient, what is the probability that this patient truly has the disease?

PPV = TP/ TP+FP

PPV is directly related to prevalence

High prevalence means high PPV

Investigators studying cardiovascular disease discover a new serum protein marker that is

correlated with the presence of ruptured atherosclerotic plaques. It is hoped that this serum marker

could be used as a screening test to identify whether a person has had a recent MI. In a phase III

clinical trial of 1400 subjects, the investigators find that of the 500 subjects who had an MI, 400 tested

positive for the serum marker, whereas 850 subjects who did not have an MI tested negative for the

marker.

If this marker were used to screen patients for recent MI, what is the probability that a person will

have had an MI given a positive serum protein analysis?

The question is asking to calculate the positive predictive value of the test, i.e, the

probability that a person with a positive serum marker on the screening test will indeed

have had a recent MI.

Specificity is the proportion of people without the disease wo test negative

SP-P-IN

Highly specific test when positive rules in the disease.

Specificity = True Negatives / True Negatives + False Positives OR = 1 – false-positive rate

Proportion of negative test results that are true negative

If the test is negative, what is the probability that this patient does not have the disease?

NPV = True Negatives / All people who tested negative (TN + FN)

NPV is inversely correlated with prevalence

High prevalence = Low NPV

How to determine whether a certain disease is associated with a certain exposure

To determine whether an association exists, we can use data from case-control and cohort studies

Used in Case-control studies

Odds that group with disease (cases) was exposed to a risk factor (a/c) divided by the odds that group without the disease (controls) was exposed (b/d)

Researchers are investigating the relationship between cell phone use and brain cancer. Of 50

brain cancer patients, 30 admitted to using a cell phone for 10 year or more. Of 400 healthy

participants in the study, 250 were found to have used a cell phone for 10 years or more.

Which of the following is an appropriate conclusion to draw from this study regarding ell phone

use and brain cancer?

The clinical study described is a case-control study. Case-controls look at those with the

disease (the cases) compared to those without the disease (the controls). The odds ratio

is then calculated as OR=(odds in disease group)/ (odds in control group) = [30/(50-

30)] / [250/(400-250)] = 9/10.

Used in cohort studies

Risk of developing disease in the exposed group divided by risk in the unexposed group.

Defined as the difference in risk between exposed and unexposed groups, or the proportion of disease occurrences that are attributable to the exposure

Number needed to treat is defined as the number of patients who need to be treated for 1 patient to benefit

Number needed to harm is defined as the number of patients who need to be exposed to a risk factor for 1 patient to be harmed

t-Test checks the difference between the means of 2 groups.

ANOVA checks the difference between the means of 3 or more groups

Chi-square checks the difference between 2 or more percentages or proportions of categorical outcomes; used for frequency data rather than for comparison of means.

A physician is studying the effects of drug A and drug B on cognitive performance in Alzheimer patients. She administers a memory test to two groups of subjects (those taking drug A and those taking drug B) and compares their mean scores. Which of the following statistical tests would be most appropriate for this purpose?

A. ANOVA

B. Chi-square test

C. Linear regression analysis

D. t-Test

E. Multiple linear regression

The t-Test is used to compare two means derived from two samples.

r is always between -1 and +1

The closer the absolute value of r is to 1, the stronger the linear correlation between the 2 variables.

Positive r value means a positive correlation

Negative r value means a negative correlation

The coefficient of determination (r2) is what is usually reported (i.e. graphs)

Type I (α) errors and Type II (β) errors indicate that you accepted the wrong hypothesis. Type I (α) error

• “False-positive” error:

• You accepted your hypothesis

(alternative hypothesis) rather

than the null-hypothesis

• The p-value is the probability of

making a type I error

Type II (β) error

• “False-negative” error:

• You fail to reject the null-

hypothesis when it is

actually wrong

• β is the probability of making a

type II error.

• Power = 1- β

A study with greater power has less type II error

The power is the probability of rejecting the null hypothesis when it is in fact false (This is what we want to happen)

Conventionally, a study should have a power of 0.8 (or a β of 0.2) to be accepted.

Important: Increasing the sample size is the most practical and important way of increasing the power of a statistical test, i.e., there is power in numbers.

A medical resident decides to test the hypothesis that people with Alzheimer’s have elevated serum sodium levels. The Type I error of this study was 0.078. What does this analysis represent for the study?

A. Determines the power of a study to detect a significant change

B. Probability of Type I error is known as β

C. Represents the probability of incorrectly rejecting the null hypothesis

D. Most studies used a probability of error level of 0.10 to determine the significance

E. It is equal to 1- β

α should be less than 0.05 to be acceptable

USMLE Step 1 Qbook, Fifth edition

USMLERx Qbank 2015

First Aid 2015 edition

USMLE Step I Secrets

…You should probably start running…

In a city with a population of 1 million, 10, 000 individuals have SLE. There are 1,000 new cases of SLE each year and 200 deaths caused by the disease. There are 2,500 deaths per year from all causes. Assuming no net emigration or immigration to the city, the incidence of SLE in this city is given by which of the following expressions?

A. 800/990,000

B. 800/1,000,000

C. 1,000/990,000

D. 1,000/1,000,000

E. 2,500/1,000,000

F. 10,000/1,000,000

In a city with a population of 1 million, 10, 000 individuals have SLE. There are 1,000 new cases of SLE each year and 200 deaths caused by the disease. There are 2,500 deaths per year from all causes. Assuming no net emigration or immigration to the city, the incidence of SLE in this city is given by which of the following expressions?

A. 800/990,000

B. 800/1,000,000

C. 1,000/990,000

D. 1,000/1,000,000

E. 2,500/1,000,000

F. 10,000/1,000,000

Don’t forget to subtract the

prevalent cases of SLE! They are

not part of the population at risk

of becoming new cases

Researchers are developing a screening test for awesomeness which has a sensitivity of 95% and a specificity of 90%. If the prevalence of awesomeness is 10%, which of the following is the best estimate for the probability that a person who tests negative for awesomeness is actually not awesome at all?

A. 45%

B. 50%

C. 85%

D. 90%

E. 95%

F. 99%

Researchers are developing a screening test for awesomeness which has a sensitivity of 95% and a specificity of 90%. If the prevalence of awesomeness is 10%, which of the following is the best estimate for the probability that a person who tests negative for awesomeness is actually not awesome at all?

A. 45%

B. 50%

C. 85%

D. 90%

E. 95%

F. 99%

Awesome Not Awesome

Pos Awesome 95 (sn=95%) 90 185

Neg Awesome 5 810 (sp=90%) 815

100

[prevalence =

10%]

900 1000

[start with a nice

round number]

Negative Predictive Value = TN/TN+FN=810/815 = 99%

A study is conducted to evaluate the average number of pizza slices consumed by medical students during their first year. Results of 100 students surveyed show an average number of pizza slices of 110 with a standard deviation of 20. Which of the following is the best estimate for the 95% confidence interval for the mean in this sample?

A. 70 to 130

B. 70 to 150

C. 85 to 115

D. 90 to 130

E. 105 to 115

F. 106 to 114

A study is conducted to evaluate the average number of pizza slices consumed by medical students during their first year. Results of 100 students surveyed show an average number of pizza slices of 110 with a standard deviation of 20. Which of the following is the best estimate for the 95% confidence interval for the mean in this sample?

A. 70 to 130

B. 70 to 150

C. 85 to 115

D. 90 to 130

E. 105 to 115

F. 106 to 114

CI = sample mean ± Z x (SD/√n) Z-score for 95% CI = 1.96 ≈ 2

= 110 ± 2 (20/√100)

= 110 ± 2 (20/10)

= 110 ± 2 (2)

= 110 ± 4

= (106, 114)

A screening test used to detect cervical cancer has a sensitivity of 96%, a specificity of 90% a positive predictive value of 92% and a negative predictive value of 95%. A recent study on the impact of Gardasil suggests that the prevalence of cervical cancer has declined. Given this information, how will this impact the results of the screening test?

A. Decrease the sensitivity

B. Decrease the specificity

C. Increase the negative predictive value

D. Increase the positive predictive value

E. Increase the sensitivity

F. Increase the specificity

A screening test used to detect cervical cancer has a sensitivity of 96%, a specificity of 90% a positive predictive value of 92% and a negative predictive value of 95%. A recent study on the impact of Gardasil suggests that the prevalence of cervical cancer has declined. Given this information, how will this impact the results of the screening test?

A. Decrease the sensitivity

B. Decrease the specificity

C. Increase the negative predictive value

D. Increase the positive predictive value

E. Increase the sensitivity

F. Increase the specificity

A change in prevalence is a change in

the population, not the screening

exam; therefore you can eliminate

answers A, B, E, and F because

sensitivity and specificity pertain to

qualities of the TEST and not the

population

If the prevalence of a disease goes

down, then you have the probability

of having more true negatives and

less true positives… thus the NPV

increases and the PPV decreases

Alaa Elmaoued Nancy Nguyen - Department of Pathology · Years of potential life lost (YPLL) Measure...

Documents

Transcript of Alaa Elmaoued Nancy Nguyen - Department of Pathology · Years of potential life lost (YPLL) Measure...