Alaa Elmaoued Nancy Nguyen - Department of Pathology · Years of potential life lost (YPLL) Measure...
Transcript of Alaa Elmaoued Nancy Nguyen - Department of Pathology · Years of potential life lost (YPLL) Measure...
Alaa Elmaoued
Nancy Nguyen
Epidemiology/population health
Incidence vs. prevalence
Measures of health status
Survival analysis interpretation
Composite health status indicators
Population pyramids and impact of demographic changes
Disease surveillance and outbreak investigation
Communicable disease transmission
Points of intervention
Study design, types and selection of studies
Descriptive studies
Analytical studies: observational vs. interventional
Systematic reviews and meta-analysis
Obtaining and describing samples
Methods to handle noncompliance
Qualitative analysis
Study interpretation
Bias, confounding, and threats to validity
Internal vs. external validity
Statistical vs. clinical significance
www.usmle.org
Rates: crude and adjusted
Crude = overall (e.g. crude mortality rate)
Adjusted = stratified by different categories (e.g. Age-adjusted mortality rates)
Mortality
Standard mortality ratio = (observed # of deaths per yr/expected # deaths per yr) x 100
If the SMR = 100, this indicates that the # observed deaths is equal to # expected
Population attributable risk (PAR) = Incidence in the total population – incidence in the nonexposed group
Population attributable risk percent (PAR%) = [(incidence in the total population-incidence in the nonexposed group)/incidence in the total population] x 100
Reproductive rates
Maternal mortality
death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of the pregnancy, from any cause related to or aggravated by the pregnancy or its management but not from accidental or incidental causes
Denominator is usually reported per 100,000 registered live births
Neonatal mortality
Death of a live-born baby within 7 days of life
Per 1,000 live births
Infant mortality
Death of a child less than 1 year of age
Per 1,000 live births
Under-5 mortality
NOT THAT
Y-axis represents the proportion of survivors and X-axis represents time moving forward
Generally used to assess survival with death as the defining “event” but can also be used for other health outcomes such as fertility
Data is used to define the intervals rather than having a predetermined interval
Makes full use of the data and is more accurate
Accounts for some loss to follow-up
Years of potential life lost (YPLL)
Measure of premature mortality or early death (i.e. people who die younger have a greater loss of future productive years than people who die at an older age)
Based on life expectancy of the population
Quality-adjusted life years (QALY)
Measure of the quality of remaining life years
Used to evaluate different healthcare interventions
Quality of life is based on a scale from 0 to 1 where 0 is death and 1 is the best possible health state
Disability-adjusted life years (DALY)
Years of life lost to premature death AND years lived with a disability of specified severity and duration
Measure of overall disease burden that combines mortality and morbidity
1. Define the outbreak and validate the existence of an outbreak
2. Examine the distribution of cases by time and place
3. Look for combinations (interactions) of relevant variables
4. Develop hypotheses based on: existing knowledge (if any), analogy to diseases of known etiology, findings from investigation of the outbreak
5. Test hypotheses
6. Recommend control measures
7. Prepare a written report of the investigation and the findings
8. Communicate findings to those involved in policy development and implementation and to the public
Attack rate = # of people at risk in whom a certain illness develops/ total # of people at risk
Herd immunity
Reportable diseases: What types of diseases are reportable/notifiable?
Definition Example
Primary Preventing the initial
development of a disease
Immunization
Secondary Early detection of existing
disease to reduce severity and
complications
Screening for cancer
Tertiary Reducing the impact of the
disease
Rehabilitation for stroke
The physical examination records of the entire incoming freshman class of 1935 at the University of Minnesota were examined in 1977 to see if their recorded height and weight at the time of admission to the university was related to the development of coronary heart disease by 1986. This is an example of:
A. A cross-sectional study
B. A case-control study
C. A concurrent cohort study
D. A retrospective cohort study
E. An experimental study
The physical examination records of the entire incoming freshman class of 1935 at the University of Minnesota were examined in 1977 to see if their recorded height and weight at the time of admission to the university was related to the development of coronary heart disease by 1986. This is an example of:
A. A cross-sectional study
B. A case-control study
C. A concurrent cohort study
D. A retrospective cohort study
E. An experimental study
Residents of three villages with three different types of water supply were asked to participate in a survey to identify cholera carriers. Because several cholera deaths had occurred recently, virtually everyone present at the time underwent examination. The proportion of residents in each village who were carriers was computed and compared. What is the proper classification for this study?
A. Cross-sectional study
B. Case-control study
C. Concurrent cohort study
D. Nonconcurrent cohort study
E. Experimental study
Residents of three villages with three different types of water supply were asked to participate in a survey to identify cholera carriers. Because several cholera deaths had occurred recently, virtually everyone present at the time underwent examination. The proportion of residents in each village who were carriers was computed and compared. What is the proper classification for this study?
A. Cross-sectional study
B. Case-control study
C. Concurrent cohort study
D. Nonconcurrent cohort study
E. Experimental study
A case control study is characterized by all of the following except:
A. It is relatively inexpensive compared with most other epidemiologic study designs
B. Patients with the disease (cases) are compared with persons without the disease (controls)
C. Incidence rates may be computed directly
D. Assessment of past exposure may be biased
E. Definition of cases may be difficult
A case control study is characterized by all of the following except:
A. It is relatively inexpensive compared with most other epidemiologic study designs
B. Patients with the disease (cases) are compared with persons without the disease (controls)
C. Incidence rates may be computed directly
D. Assessment of past exposure may be biased
E. Definition of cases may be difficult
Cross-sectional study Case-series/Case-report
AKA prevalence study
Both exposure and disease outcome are determined simultaneously
Cannot establish temporal relationship between the exposure and onset of disease
Case report = one person
Case series = more than one
Evaluates subjects with known exposure with similar treatment OR for exposure and outcome simultaneously
No hypothesis testing
Vulnerable to selection bias (select certain patients)
No control/comparison group = low internal validity
Vulnerable to Hawthorne effect
Ecological study
• Based on aggregate or group data, not on
individual (e.g. cause of death in
different countries)
Selection of subjects is based on exposure
Groups are followed to compare incidence of disease or other health outcomes
Prospective aka concurrent aka longitudinal cohort study
Retrospective aka nonconcurrent aka historical cohort study
Good for evaluating temporal/causal association
Bad for rare diseases
Expensive and time-consuming
Problems with loss-to-follow-up
Selection of subjects is based on disease or other health outcome
Groups are evaluated to compare past exposure
Incident > prevalent cases (survival vs. development)
Matching
Group = frequency match
Individual = each case matched to a control
Relatively inexpensive and does not require as much time
Susceptible to recall bias
Good for rare diseases
Bad for rare exposures
Randomized Control Trial
Essentially the Gold Standard
Unethical in a lot of cases!
Double-blind
Placebo-controlled
Community intervention
Systematic Review Meta-analysis
A research study which aims to provide an exhaustive summary of current literature relevant to a research question.
Crucial to EBM
A statistical technique used to combine the results of all eligible studies in a systematic review into a single quantitative estimate or summary effect size
Effect sizes measure the strength of the relationship between two variables, thereby providing
information about the magnitude of the intervention effect
Heterogeneity is a value calculated to determine if individual studies are similar enough to compare
(prefer non-significant findings for heterogeneity)
Publication bias is particularly problematic for systematic reviews because not all studies are
published, depending on the significance and direction of effects detected.
Horizontal line =
confidence interval
Center line = 1.0 (no association)
Overall result from the
meta-analysis
Each square represents the result from
individual studies
Any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of disease
Selection bias
Error introduced when the study population does not represent the target population
Can be introduced at any stage of a research study
Information bias
Occurs during data collection and can lead to misclassification
Sampling bias or non-random sampling bias: a selection procedure that yields a non-representative sample in which a parameter estimate differs from the existing in the target population
Example is telephone random sampling which would systematically exclude households without telephones
Ascertainment bias
Healthcare access bias
Survivor treatment selection bias
Recall bias
If the presence of disease influences the perception of its causes or the search for exposure to the putative cause
Common in case-control studies where participants are aware of their disease status, but can also occur in cohort studies
Ecologic fallacy
When analyses realized in an ecological group analysis are used to make inferences at the individual level
Hawthorne effect
When individuals modify they react or behave in response to their awareness of being observed
An extraneous variable that correlates (directly or inversely) with both the dependent variable and the independent variable
Example: Drinking coffee and pancreatic cancer
Confounding is not an error in the study but can be considered a true phenomenon that is identified in a study and must be understood
One approach is to stratify…
If you stratify the data by the confounding variable then you will find that the measure of association will equal 1.0
If you know of a possible confounder during the design phase of your study, you can match cases to controls based on the confounding variable
Internal validity
The extent to which a study is able to make causal conclusions based the design and ability to reduce systematic error
Essentially how well you designed your study (confounding = red flag!)
External validity
Whether the findings of a study can be generalized to the rest of the population
Example: hospital cohorts
Alaa Elmaoued
Nancy Nguyen
Sensitivity and Specificity
Positive and Negative Predictive Values
Incidence and Prevalence
Odds Ratio
Relative Risk
Attributable Risk
Relative Risk Reduction
Absolute Risk Reduction
Number Needed to Treat
Number Needed to Harm
t-Test
ANOVA
Chi-square
Pearson Correlation Coefficient
Error types
Incidence RATE = Number of new cases / Population at risk
Incidence looks at new cases at a time period
Prevalence = Number of total existing cases / Population at risk
Prevalence = incidence x duration of disease
Chronic disease with long duration has a high prevalence
Disease with short duration has low prevalence and equals the incidence of disease
Smithville has a stable population of 100,000 and 2000 individuals in this community have been diagnosed with disease X. Although 300 individuals in Smithville die each year from all causes, 100 of those die from disease X. There are 50 new cases of the disease each year.
The annual incidence of this disease is represented by which of the following?
The incidence is represented by the number of new cases of the disease in a given
period divided by the susceptible population. Because the 2000 people with the
disease are no longer susceptible, they must e subtracted from the total population;
thus the incidence is 50/98,000.
A research group is studying sickle cell disease in a geographically isolated community of
6000 people. A genetic analysis is performed on every community member At the beginning
of the year, it is determined that 10% are homozygous for hemoglobin S and therefore have
sickle cell disease, and 30% of the community is heterozygous for the mutant allele. Over the
course of the year, 100 infants are born, six of whom are diagnosed with sickle cell disease. Of
80 people who die during the year, three had sickle cell disease.
Which of the following is the current prevalence of sickle cell disease in this population?
Prevalence is the total number of cases in a population divided by the total population at risk of
the disease. Multiply the initial population (6000) by the initial prevalence (10%), yielding 600
cases. Over the course of the year, there was a net gain of 3 patients with sickle cell disease,
bringing the new total to 603. Likewise, the new population at risk is 6020, a net gain of 20 people.
Therefore, the current prevalence is 603/6020.
Be Sensitive to Positive people
Sensitivity is how good a test will identify those who have the disease
Sensitivity = True Positives/(True Positives + False Negatives) OR = 1 – false-negative rate
SN-N-OUT
A highly sensitive test Rules Out the disease if it is negative
β-Thalassemia major results from a homozygous genotype that leads to complete absence of both the β-globin
chains. A study subjected 100,000 participants to an intrauterine screening test; 87 tested positive for β-
thalassemia major, and the remaining 99,913 tested negative. In 7 of those 87 cases the results were shown to be
false positive. Ultimately, 100 of those originally screened were found to actually have the disease.
Which of the following is the correct sensitivity of the intrauterine screening test?
Proportion of positive test results tat are truly positive
If the test result is positive in this patient, what is the probability that this patient truly has the disease?
PPV = TP/ TP+FP
PPV is directly related to prevalence
High prevalence means high PPV
Investigators studying cardiovascular disease discover a new serum protein marker that is
correlated with the presence of ruptured atherosclerotic plaques. It is hoped that this serum marker
could be used as a screening test to identify whether a person has had a recent MI. In a phase III
clinical trial of 1400 subjects, the investigators find that of the 500 subjects who had an MI, 400 tested
positive for the serum marker, whereas 850 subjects who did not have an MI tested negative for the
marker.
If this marker were used to screen patients for recent MI, what is the probability that a person will
have had an MI given a positive serum protein analysis?
The question is asking to calculate the positive predictive value of the test, i.e, the
probability that a person with a positive serum marker on the screening test will indeed
have had a recent MI.
Specificity is the proportion of people without the disease wo test negative
SP-P-IN
Highly specific test when positive rules in the disease.
Specificity = True Negatives / True Negatives + False Positives OR = 1 – false-positive rate
Proportion of negative test results that are true negative
If the test is negative, what is the probability that this patient does not have the disease?
NPV = True Negatives / All people who tested negative (TN + FN)
NPV is inversely correlated with prevalence
High prevalence = Low NPV
How to determine whether a certain disease is associated with a certain exposure
To determine whether an association exists, we can use data from case-control and cohort studies
Used in Case-control studies
Odds that group with disease (cases) was exposed to a risk factor (a/c) divided by the odds that group without the disease (controls) was exposed (b/d)
Researchers are investigating the relationship between cell phone use and brain cancer. Of 50
brain cancer patients, 30 admitted to using a cell phone for 10 year or more. Of 400 healthy
participants in the study, 250 were found to have used a cell phone for 10 years or more.
Which of the following is an appropriate conclusion to draw from this study regarding ell phone
use and brain cancer?
The clinical study described is a case-control study. Case-controls look at those with the
disease (the cases) compared to those without the disease (the controls). The odds ratio
is then calculated as OR=(odds in disease group)/ (odds in control group) = [30/(50-
30)] / [250/(400-250)] = 9/10.
Used in cohort studies
Risk of developing disease in the exposed group divided by risk in the unexposed group.
Defined as the difference in risk between exposed and unexposed groups, or the proportion of disease occurrences that are attributable to the exposure
Number needed to treat is defined as the number of patients who need to be treated for 1 patient to benefit
Number needed to harm is defined as the number of patients who need to be exposed to a risk factor for 1 patient to be harmed
t-Test checks the difference between the means of 2 groups.
ANOVA checks the difference between the means of 3 or more groups
Chi-square checks the difference between 2 or more percentages or proportions of categorical outcomes; used for frequency data rather than for comparison of means.
A physician is studying the effects of drug A and drug B on cognitive performance in Alzheimer patients. She administers a memory test to two groups of subjects (those taking drug A and those taking drug B) and compares their mean scores. Which of the following statistical tests would be most appropriate for this purpose?
A. ANOVA
B. Chi-square test
C. Linear regression analysis
D. t-Test
E. Multiple linear regression
The t-Test is used to compare two means derived from two samples.
r is always between -1 and +1
The closer the absolute value of r is to 1, the stronger the linear correlation between the 2 variables.
Positive r value means a positive correlation
Negative r value means a negative correlation
The coefficient of determination (r2) is what is usually reported (i.e. graphs)
Type I (α) errors and Type II (β) errors indicate that you accepted the wrong hypothesis. Type I (α) error
• “False-positive” error:
• You accepted your hypothesis
(alternative hypothesis) rather
than the null-hypothesis
• The p-value is the probability of
making a type I error
Type II (β) error
• “False-negative” error:
• You fail to reject the null-
hypothesis when it is
actually wrong
• β is the probability of making a
type II error.
• Power = 1- β
A study with greater power has less type II error
The power is the probability of rejecting the null hypothesis when it is in fact false (This is what we want to happen)
Conventionally, a study should have a power of 0.8 (or a β of 0.2) to be accepted.
Important: Increasing the sample size is the most practical and important way of increasing the power of a statistical test, i.e., there is power in numbers.
A medical resident decides to test the hypothesis that people with Alzheimer’s have elevated serum sodium levels. The Type I error of this study was 0.078. What does this analysis represent for the study?
A. Determines the power of a study to detect a significant change
B. Probability of Type I error is known as β
C. Represents the probability of incorrectly rejecting the null hypothesis
D. Most studies used a probability of error level of 0.10 to determine the significance
E. It is equal to 1- β
α should be less than 0.05 to be acceptable
USMLE Step 1 Qbook, Fifth edition
USMLERx Qbank 2015
First Aid 2015 edition
USMLE Step I Secrets
…You should probably start running…
In a city with a population of 1 million, 10, 000 individuals have SLE. There are 1,000 new cases of SLE each year and 200 deaths caused by the disease. There are 2,500 deaths per year from all causes. Assuming no net emigration or immigration to the city, the incidence of SLE in this city is given by which of the following expressions?
A. 800/990,000
B. 800/1,000,000
C. 1,000/990,000
D. 1,000/1,000,000
E. 2,500/1,000,000
F. 10,000/1,000,000
In a city with a population of 1 million, 10, 000 individuals have SLE. There are 1,000 new cases of SLE each year and 200 deaths caused by the disease. There are 2,500 deaths per year from all causes. Assuming no net emigration or immigration to the city, the incidence of SLE in this city is given by which of the following expressions?
A. 800/990,000
B. 800/1,000,000
C. 1,000/990,000
D. 1,000/1,000,000
E. 2,500/1,000,000
F. 10,000/1,000,000
Don’t forget to subtract the
prevalent cases of SLE! They are
not part of the population at risk
of becoming new cases
Researchers are developing a screening test for awesomeness which has a sensitivity of 95% and a specificity of 90%. If the prevalence of awesomeness is 10%, which of the following is the best estimate for the probability that a person who tests negative for awesomeness is actually not awesome at all?
A. 45%
B. 50%
C. 85%
D. 90%
E. 95%
F. 99%
Researchers are developing a screening test for awesomeness which has a sensitivity of 95% and a specificity of 90%. If the prevalence of awesomeness is 10%, which of the following is the best estimate for the probability that a person who tests negative for awesomeness is actually not awesome at all?
A. 45%
B. 50%
C. 85%
D. 90%
E. 95%
F. 99%
Awesome Not Awesome
Pos Awesome 95 (sn=95%) 90 185
Neg Awesome 5 810 (sp=90%) 815
100
[prevalence =
10%]
900 1000
[start with a nice
round number]
Negative Predictive Value = TN/TN+FN=810/815 = 99%
A study is conducted to evaluate the average number of pizza slices consumed by medical students during their first year. Results of 100 students surveyed show an average number of pizza slices of 110 with a standard deviation of 20. Which of the following is the best estimate for the 95% confidence interval for the mean in this sample?
A. 70 to 130
B. 70 to 150
C. 85 to 115
D. 90 to 130
E. 105 to 115
F. 106 to 114
A study is conducted to evaluate the average number of pizza slices consumed by medical students during their first year. Results of 100 students surveyed show an average number of pizza slices of 110 with a standard deviation of 20. Which of the following is the best estimate for the 95% confidence interval for the mean in this sample?
A. 70 to 130
B. 70 to 150
C. 85 to 115
D. 90 to 130
E. 105 to 115
F. 106 to 114
CI = sample mean ± Z x (SD/√n) Z-score for 95% CI = 1.96 ≈ 2
= 110 ± 2 (20/√100)
= 110 ± 2 (20/10)
= 110 ± 2 (2)
= 110 ± 4
= (106, 114)
A screening test used to detect cervical cancer has a sensitivity of 96%, a specificity of 90% a positive predictive value of 92% and a negative predictive value of 95%. A recent study on the impact of Gardasil suggests that the prevalence of cervical cancer has declined. Given this information, how will this impact the results of the screening test?
A. Decrease the sensitivity
B. Decrease the specificity
C. Increase the negative predictive value
D. Increase the positive predictive value
E. Increase the sensitivity
F. Increase the specificity
A screening test used to detect cervical cancer has a sensitivity of 96%, a specificity of 90% a positive predictive value of 92% and a negative predictive value of 95%. A recent study on the impact of Gardasil suggests that the prevalence of cervical cancer has declined. Given this information, how will this impact the results of the screening test?
A. Decrease the sensitivity
B. Decrease the specificity
C. Increase the negative predictive value
D. Increase the positive predictive value
E. Increase the sensitivity
F. Increase the specificity
A change in prevalence is a change in
the population, not the screening
exam; therefore you can eliminate
answers A, B, E, and F because
sensitivity and specificity pertain to
qualities of the TEST and not the
population
If the prevalence of a disease goes
down, then you have the probability
of having more true negatives and
less true positives… thus the NPV
increases and the PPV decreases