Post on 02-Apr-2018
7/27/2019 1_2 biostatistics
1/8
1.2: Biostatistics 1.2.1
history prospectively over a period o time. The purpose is to
determine which characteristics, exposures, or risk actors are
associated with a given outcome. Unlike cross-sectional or case-
control studies, however, the outcome o interest in a cohortstudy occurs in the uture, ater the subject is enrolled.
In the cardiovascular literature, one o the most prominent co-
hort studies is the Framingham study o cardiovascular risk ac-
tors, which started in 1948, when more than 6,000 individuals
rom the same Massachusetts town were enrolled. The cohort
was then ollowed with various examinations every two years to
determine the association o various risk actors with cardiovas-
cular diseases.
Case SeriesA case series is a descriptive account o a collection o patients,
in which each case shares some characteristic o interest. A caseseries can be the frst step in identiying a new disease process,
describing a novel physical or imaging fnding, or reporting on
a novel treatment method. Case series reports can serve as a
catalyst to other studies.
Case-Control StudiesCase-control studies are retrospective studies that start with
individuals who already have a disease or trait o interest (i.e.,
the cases), then match them with control subjects who lack that
disease or trait. The studies then attempt to look back at events,
exposures, and characteristics to see whether any dierence ex-
ists between the two groups. The idea is to fnd a risk actor that
is present in the history o the cases, but not the controls.
Cross-Sectional StudiesCross-sectional studies are descriptive studies about the charac-
teristics o a group o individuals at a single point in time. These
studies describe what is happening right now in a group o
people. Cross-sectional studies can be used to establish norms
(e.g., or a new biomarker), evaluate the useulness o a new
diagnostic procedure, or poll individuals about their attitudes
(e.g., towards health care).
Introduction
One o the strengths o the feld o cardiology is its strong
evidence base. Cardiology is known or its large clinical trials,which provide a large amount o new inormation about treat-
ments and practices. A well-qualifed cardiologist must under-
stand biostatistics to help decide whether results presented in
the literature can be believed and should be applied to their
treatment o patients.
The purpose o this module is to provide a basic oundation
in biostatistics so that the reader can better evaluate clinical
literature. The ocus is on the interpretation o research meth-
ods, rather than on calculations and computational details. This
module emphasizes the biostatistics methods that the cardio-
vascular specialist is most likely to encounter in modern medical
literature.
An additional resource is the American Heart Association Scien-
tifc Statement that reviews the appropriate statistical evaluation
o novel markers o cardiovascular risk. It provides an excellent
summary and explanation o some o the most requently used
biostatistics within the feld o cardiovascular medicine.1
Study Designs
Medical research study designs all into two major categories:
1) observational and 2) interventional. In observational studies,
subjects are observed but no medical intervention is perormed.
The observations may be perormed prospectively (i.e., orward-looking cohort studies), retrospectively (i.e., backward-looking
case-control studies), or simultaneously (i.e., cross-sectional
studies). Interventional studies, or clinical trials, evaluate the
eects o an intervention on outcomes and are considered to
provide a stronger level o evidence than observational studies.
Understanding how a study is designed is essential to under-
standing the conclusions that can be drawn rom it.
Cohort StudyA cohort study is an observational study that enrolls a group o
subjects with something in common and ollows their natural
Chapter 1: General Principles
1.2: BiostatisticsLori B. Daniels, MD, MAS, FACC
Consulting Fees/Honoraria: Roche Diagnostics, Alere, Inc.; Research Grants: Roche Diagnostics.
Learner Objectives
Upon completion o this module, the reader will be able to:
1. Correctly identiy the study design used in a given medical study, and list its uses.
2. Describe the p value and interpret its meaning and relationship to hypothesis testing.
3. Calculate sensitivity, specifcity, and positive and negative predictive values or a diagnostic test.
4. Compare various methods to account or conounding variables in clinical studies, including multivariable regression and propen-
sity analysis.
5. Recognize how survival analysis diers rom other regression analyses and identiy when survival analysis should be used.
7/27/2019 1_2 biostatistics
2/8
1.2.2 Chapter 1: General Principles
mean, and 99.7% lie within 3 SDs o the mean. Even i the
distribution is not bell-shaped, at least 75% o the values will
always all within 2 SDs o the mean.
The mean and SD are also useul or determining whether a set
o variables is skewed, when only summary statistics are provided.
I the mean is smaller than 2 SDs, the data are probably skewed.
Hypothesis Testing
The purpose o a hypothesis test is to permit generalizations
about a population based upon observations made in a sample
rom that population. When making comparisons between
two groups (e.g., a group that received some therapy vs. a
group that received a placebo), the hypothesis being tested is
that some dierence exists between the two groups. The null
hypothesis, which must be disproven in order to claim a dier-
ence, is that the two groups are equal.
Errors in Hypothesis TestingErroneous conclusions can arise rom hypothesis tests in two
ways. A type I error is analogous to a alse-positive diagnostic
test. A type I error incorrectly concludes signifcance (and rejects
the null hypothesis) when the result is not really signifcant. A
type II error is analogous to a alse-negative diagnostic test.
A type II error incorrectly concludes no signifcance when the
result is, in act, signifcant. The probability o making a type II
error is known as beta, or .
The signifcance level o a test is also known as alpha, or . This is
the probability o making a type I error (i.e., incorrectly concluding
signifcance). For many statistical tests, the p value can be com-
pared to the signifcance level to either detect a statistically signif-
cant dierence (i.e., reject the null hypothesis), or to conclude
that the null hypothesis cannot be rejected at that signifcance
level. For most studies, a signifcance level o 0.05 is chosen.
PowerThe power o a statistical test is its ability to detect signifcance
when a result is indeed signifcant. In the case o a diagnostic
test, the power o a statistical test corresponds to the sensitiv-
ity o a diagnostic test, or the ability to detect a disease that is
present. Investigators want the statistical test to be sensitive to
detecting signifcance when it should be detected, and minimiz-
ing the risk o a type II error. Power can be calculated as 1 ,
or 1 minus the probability o making a type II error.
P ValuesThe p value is the probability o obtaining a result at least asextreme as the one observed, ithe null hypothesis is true (i.e.,
the groups being compared are equal). The p value can also be
thought o as the probability that the observed result is due to
chance alone. Ater a statistical test has been perormed, i its p val-
ue is less than (oten set at 0.05), the null hypothesis is rejected.
Importantly, a signifcant p value does not provide absolute
proo that a dierence between groups exists; rather, a p value
o 0.05 or less means that i the groups do notdier, results as
extreme as those observed would happen only 1 in 20 times or
Clinical TrialA clinical trial is a study undertaken to determine whether a
particular procedure or treatment can improve an outcome or
a selected group o individuals. In controlled clinical trials, the
intervention being tested is compared with another procedure
or drug, generally a placebo or the current standard o care.
Randomization assigns subjects to either the active treatment or
the placebo group by chance, thereby eliminating bias in patient
assignment and allowing patient characteristics to be evenly
distributed between groups.
In double-blind studies, neither the study investigator nor the
subject knows whether they are in the treatment group or the
control group, thus eliminating potential bias. The most robust
clinical trial design is considered to be the randomized, double-
blind, placebo-controlled trial, because it can provide evidence
o causation (i.e., the best indication that any eects seen are
due to the intervention).
Descriptive Statistics
Measures of Central TendencyThe correct measures to use or describing a population dependon the type o data being analyzed. The mean measures the
middle o a distribution o numerical variables, i that variable
has a normal (i.e., bell-shaped) distribution in the population
being studied. The mean, also called the arithmetic mean, is
the average o the observations. The mean value is sensitive to
extreme values, especially in small sample sizes, so it is not used
or skewed data.
Instead, the median is used to measure the middle o a distribu-
tion o numerical variables that are skewed. Medians are also
used or ordinal data, which are data that have an inherent
order among categories (e.g., New York Heart Associationclassifcation or heart ailure severity). The median is the point
at which hal the observations are larger and hal are smaller.
Unlike the mean, it is unaected by extreme values.
Measures of VariationRange: The range is the simplest measure o spread and is de-
fned as the highest observed value minus the lowest observed
value. One disadvantage o the range is that it tends to increase
as the number o observations increases, since extreme values
are more likely to occur with a greater number o data points.
Consequently, reporting percentile values such as the 25th and
75th percentiles, or the 5th and 95th percentiles, is oten pre-
erred. The interquartile range (i.e., the dierence between the75th and 25th percentiles) is oten used in conjunction with the
median, to describe a set o skewed observations.
Standard deviation: The most commonly used measure o dis-
persion is the standard deviation (SD), a measure o the spread
o data about the mean. The SD is calculated as the square root
o the variance, and the variance is the average o the squares
o the deviations rom the mean. I the distribution o observa-
tions is bell-shaped, then approximately 67% o observations
are within 1 SD o the mean, 95% are within 2 SDs o the
7/27/2019 1_2 biostatistics
3/8
1.2: Biostatistics 1.2.3
event. The NNT is the reciprocal o the ARR (i.e., NNT = 1 ARR).
The relative risk reduction (RRR) is also requently presented and
is the amount o risk reduction relative to the baseline risk. It is
calculated as ARR divided by the baseline event rate (i.e., divided
by the incidence in those without the exposure).
Example: A new antiplatelet agent is being tested or its ability
to decrease the incidence o myocardial inarction (MI) at 60
days. One thousand patients are randomized to either the new
drug or to a placebo, resulting in 500 people in each group.Ater 60 days, 15 patients in the active treatment group and
25 patients in the placebo group have experienced the primary
outcome (i.e., MI). What is the NNT with this new medication,
to prevent one MI? What is the RRR?
Answer: The incidence o MI in the treatment group was 15
500 = 0.03. The incidence in the placebo group was 25 500 =
0.05. Thereore, the ARR = 0.05 0.03 = 0.02, or 2%. The NNT
= 1 ARR = 1 0.02 = 50. Thereore, 50 patients would need
to be treated with the new medication or 60 days to prevent
one MI. The RRR = 0.02 0.05 = 0.40, or 40%.
Assessing Yield o Diagnostic Tests
An important part o cardiology is evaluating the accuracy o
diagnostic tests. Even with advanced diagnostic technology such
as cardiac CT scans, nuclear stress tests, and electrophysiology
(EP) studies or diagnosing inducible arrhythmias, the possibil-
ity o alse positive and alse-negative test results exists. The
accuracy o a diagnostic test depends on both its sensitivity and
its specifcity.
SensitivitySensitivity is the probability o a positive test result in patients
who have the condition. It is calculated as: true positives (true
positives + alse negatives) (Table 1). Tests with higher sensitivity
mean lower chances o missing the disease. A very sensitive test,
when negative, rules out disease. A helpul mnemonic or this is
SNOUT: SeNsitive test = good or rule OUT.
SpecificityThe specifcity o a test is the tests ability to identiy individuals
who do nothave disease. More precisely, specifcity is the prob-
ability o a negative test result in a patient who does not have
the condition being measured. Specifcity is calculated as: true
negatives (true negatives + alse positives) (Table 1). Tests with
higher specifcity mean that ewer normal people are misdiag-
nosed as having the disease. A very specifc test, when positive,rules in disease. A helpul mnemonic or this is SPIN: SPecifc
test = rule IN.
Positive and Negative Predictive ValuesPerormance o a diagnostic test can also be assessed by the
positive predictive value (PPV) and the negative predictive value
(NPV). The PPV is the probability that a patient whose test is
positive actually has the disease. It is calculated as: true positives
(true positives + alse positives). The NPV is the probability that
a patient whose test is negative does not have the disease. It is
less. Similarly, ailure to detect a signifcant dierence does not
mean that a dierence does not exist.
The p value has oten been subject to misinterpretation. The p
value is not the probability that the null hypothesis is true. It also
does not indicate the size or importance o the observed eect.
Even an eect that is highly statistically signifcant (e.g., p