Statistics in Clinical Research for Residents
-
Upload
iyanasiana -
Category
Documents
-
view
218 -
download
0
Transcript of Statistics in Clinical Research for Residents
-
7/27/2019 Statistics in Clinical Research for Residents
1/74
Statistical Methods
in Clinical Research
-
7/27/2019 Statistics in Clinical Research for Residents
2/74
-
7/27/2019 Statistics in Clinical Research for Residents
3/74
-
7/27/2019 Statistics in Clinical Research for Residents
4/74
Overview Data types
Summarizing data using descriptive statistics
Standard error
Confidence Intervals
-
7/27/2019 Statistics in Clinical Research for Residents
5/74
Overview P values
One vs two tailed tests
Alpha and Beta errors Sample size considerations and power
analysis
Statistics for comparing 2 or more groups
with continuous data Non-parametric tests
-
7/27/2019 Statistics in Clinical Research for Residents
6/74
Overview
Regression and Correlation
Risk Ratios and Odds Ratios
Survival Analysis
Cox Regression
-
7/27/2019 Statistics in Clinical Research for Residents
7/74
Types of Data Discrete Data-limited number of choices
Binary: two choices (yes/no) Dead or alive
Disease-free or not
Categorical: more than two choices, not ordered Race
Age group
Ordinal: more than two choices, ordered Stages of a cancer
Likert scale for response
E.G. strongly agree, agree, neither agree or disagree, etc.
-
7/27/2019 Statistics in Clinical Research for Residents
8/74
-
7/27/2019 Statistics in Clinical Research for Residents
9/74
Types of data Continuous data
Theoretically infinite possible values (withinphysiologic limits) , including fractional values Height, age, weight
Can be interval Interval between measures has meaning.
Ratio of two interval data points has no meaning
Temperature in celsius, day of the year). Can be ratio
Ratio of the measures has meaning
Weight, height
-
7/27/2019 Statistics in Clinical Research for Residents
10/74
HistogramContinuous Data
No segmentation of data into groups
-
7/27/2019 Statistics in Clinical Research for Residents
11/74
Frequency Distribution
Segmentation of data into groups
Discrete or continuous data
-
7/27/2019 Statistics in Clinical Research for Residents
12/74
Box and Whiskers Plots
-
7/27/2019 Statistics in Clinical Research for Residents
13/74
Box and Whisker Plots
Popular in Epidemiologic Studies
Useful for presenting comparative data graphically
-
7/27/2019 Statistics in Clinical Research for Residents
14/74
Numeric Descriptive Statistics Measures of central tendency of data
Mean
Median
Mode
Measures of variability of data
Standard Deviation
Interquartile range
-
7/27/2019 Statistics in Clinical Research for Residents
15/74
Numeric Descriptive Statistics
-
7/27/2019 Statistics in Clinical Research for Residents
16/74
Sample Mean Most commonly used measure of central tendency
Best applied in normally distributed continuous data.
Not applicable in categorical data
Definition: Sum of all the values in a sample, divided by the number of
values.
-
7/27/2019 Statistics in Clinical Research for Residents
17/74
Sample Median Used to indicate the average in a skewed population
Often reported with the mean If the mean and the median are the same, sample is normally
distributed. It is the middle value from an ordered listing of the
values If an odd number of values, it is the middle value
If even number of values, it is the average of the two middle
values.
Mid-value in interquartile range
-
7/27/2019 Statistics in Clinical Research for Residents
18/74
Sample Mode Infrequently reported as a value in studies.
Is the most common value
More frequently used to describe the
distribution of data Uni-modal, bi-modal, etc.
-
7/27/2019 Statistics in Clinical Research for Residents
19/74
Interquartile range Is the range of data from the 25th percentile to
the 75th percentile
Common component of a box and whiskers
plot
It is the box, and the line across the box is the
median or middle value
Rarely, mean will also be displayed.
-
7/27/2019 Statistics in Clinical Research for Residents
20/74
Standard Error A fundamental goal of statistical analysis is to estimate
a parameter of a population based on a sample
The values of a specific variable from a sample are anestimate of the entire population of individuals whomight have been eligible for the study.
A measure of the precision of a sample in estimatingthe population parameter.
-
7/27/2019 Statistics in Clinical Research for Residents
21/74
Standard Error Standard error of the mean
Standard deviation / square root of (sample size)
(if sample greater than 60)
Standard error of the proportion Square root of (proportion X 1 - proportion) / n)
Important: dependent on sample size Larger the sample, the smaller the standard error.
-
7/27/2019 Statistics in Clinical Research for Residents
22/74
Clarification Standard Deviation measures the
variability or spread of the data in an
individual sample.
Standard error measures the precision
of the estimate of a populationparameter provided by the sample meanor proportion.
-
7/27/2019 Statistics in Clinical Research for Residents
23/74
Standard Error Significance:
Is the basis of confidence intervals
A 95% confidence interval is defined by Sample mean (or proportion) 1.96 X standard error
Since standard error is inversely related to the
sample size: The larger the study (sample size), the smaller the
confidence intervals and the greater the precision of theestimate.
-
7/27/2019 Statistics in Clinical Research for Residents
24/74
Confidence Intervals May be used to assess a single point
estimate such as mean or proportion.
Most commonly used in assessing the
estimate of the difference between two
groups.
-
7/27/2019 Statistics in Clinical Research for Residents
25/74
Confidence Intervals
Commonly reported in studies to provide an estimate of the precision
of the mean.
-
7/27/2019 Statistics in Clinical Research for Residents
26/74
Confidence Intervals
-
7/27/2019 Statistics in Clinical Research for Residents
27/74
P Values The probability that any observation is due to chance
alone assuming that the null hypothesis is true
Typically, an estimate that has a p value of 0.05 or less is
considered to be statistically significant or unlikely to occurdue to chance alone.
The P value used is an arbitrary value
P value of 0.05 equals 1 in 20 chance
P value of 0.01 equals 1 in 100 chance
P value of 0.001 equals 1 in 1000 chance.
-
7/27/2019 Statistics in Clinical Research for Residents
28/74
P Values and Confidence Intervals P values provide less information than confidence
intervals. A P value provides only a probability that estimate is due to chance
A P value could be statistically significant but of limited clinicalsignificance.
A very large study might find that a difference of .1 on a VAS Scale of 0 to10 is statistically significant but it may be of no clinical significance
A large study might find many significant findings during multivariableanalyses.
a large study dooms you to statistical significance
Anonymous Statistician
-
7/27/2019 Statistics in Clinical Research for Residents
29/74
P Values and Confidence Intervals Confidence intervals provide a range of plausible values of the
population mean
For most tests, if the confidence interval includes 0, then it is notsignificant.
Ratios: if CI includes 1, then is not significant
The interval contains the true population value 95% of the time.
If a confidence interval range is very wide, then plausible value mightrange from very low to very high.
Example: A relative risk of 4 might have a confidence interval of 1.05 to9, suggesting that although the estimate is for a 400% increased risk, anincreased risk of 5% to 900% is plausible.
-
7/27/2019 Statistics in Clinical Research for Residents
30/74
Errors Type I error
Claiming a difference between two samples
when in fact there is none. Remember there is variability among samples-
they might seem to come from differentpopulations but they may not.
Also called the error. Typically 0.05 is used
-
7/27/2019 Statistics in Clinical Research for Residents
31/74
Errors Type II error
Claiming there is no difference between two
samples when in fact there is.Also called a error. The probability of not making a Type II error
is 1 - , which is called the power of the
test. Hidden error because cant be detected
without a proper power analysis
-
7/27/2019 Statistics in Clinical Research for Residents
32/74
Errors
Null Hypothesis
H0
Alternative
Hypothesis
H1
Null Hypothesis
H0 No Error Type I
AlternativeHypothesis
H1
Type II
No Error
Test Result
Truth
-
7/27/2019 Statistics in Clinical Research for Residents
33/74
Sample Size CalculationAlso called power analysis.
When designing a study, one needs to determine howlarge a study is needed.
Power is the ability of a study to avoid a Type II error. Sample size calculation yields the number of study
subjects needed, given a certain desired power todetect a difference and a certain level of P value that
will be considered significant. Many studies are completed without proper estimate of
appropriate study size.
This may lead to a negative study outcome in error.
-
7/27/2019 Statistics in Clinical Research for Residents
34/74
Sample Size Calculation Depends on:
Level of Type I error: 0.05 typical
Level of Type II error: 0.20 typical
One sided vs two sided: nearly always two
Inherent variability of population
Usually estimated from preliminary data
The difference that would be meaningful
between the two assessment arms.
-
7/27/2019 Statistics in Clinical Research for Residents
35/74
One-sided vs. Two-sided Most tests should be framed as a two-
sided test.
When comparing two samples, we usuallycannot be sure which is going to be bebetter. You never know which directions study results
will go. For routine medical research, use only two-
sided tests.
-
7/27/2019 Statistics in Clinical Research for Residents
36/74
Sample size for proportions
Stata input: Mean 1 = .2, mean 2 = .3, = .05, power (1-) =.8.
-
7/27/2019 Statistics in Clinical Research for Residents
37/74
Sample Size for Continuous Data
Stata input: Mean 1 = 20, mean 2 = 30, = .05, power (1-) =.8, std. dev. 10.
-
7/27/2019 Statistics in Clinical Research for Residents
38/74
Statistical Tests Parametric tests
Continuous data normally distributed
Non-parametric tests
Continuous data not normally distributed
Categorical or Ordinal data
Ch i t t f i th f 2
-
7/27/2019 Statistics in Clinical Research for Residents
39/74
Choosing a test for comparing the averages of 2 or moresamples of scores of experiments with one treatment factor
Data Between subjects
(independent samples)
Within subjects
(related samples)2 samples
Interval Independent t-test Paired t-test
Ordinal Wilcoxon-Mann-Whitney
test
Wilcoxon signed ranks
test, Sign test
Nominal Chi-square test Mc Nemar test
> 2 samples
Interval One way ANOVA Repeated measuredANOVA
Ordinal Kruskal-Wallis test Friedman test
Nominal Chi-square test Cochrans Q test
(dichotomous data only)
-
7/27/2019 Statistics in Clinical Research for Residents
40/74
Scheme for choosing one-sample test
Nominal 2 categories >2 categoriesBinomial test Chi-square
testOrdinal Randomness Distribution
Runs test Kolmogorov-Smirnov test
Interval Mean Distribution
t-test Kolmogorov-Smirnov test
-
7/27/2019 Statistics in Clinical Research for Residents
41/74
-
7/27/2019 Statistics in Clinical Research for Residents
42/74
Comparison of 2 Sample Means Students T test
Assumes normally distributed continuous
data.
T value = difference between meansstandard error of difference
T value then looked up in Table todetermine significance
-
7/27/2019 Statistics in Clinical Research for Residents
43/74
Paired T Tests Uses the change before
and after intervention in asingle individual
Reduces the degree ofvariability between thegroups
Given the same number ofpatients, has greaterpower to detect adifference between groups
-
7/27/2019 Statistics in Clinical Research for Residents
44/74
Analysis of Variance Used to determine if two or more samples are
from the same population- the null hypothesis.
If two samples, is the same as the T test. Usually used for 3 or more samples.
If it appears they are not from same
population, cant tell which sample is different.
Would need to do pair-wise tests.
-
7/27/2019 Statistics in Clinical Research for Residents
45/74
Non-parametric Tests Testing proportions
(Pearsons) Chi-Squared ( 2) Test Fishers Exact Test
Testing ordinal variables Mann Whiney U Test
Kruskal-Wallis One-way ANOVA
Testing Ordinal Paired Variables Sign Test
Wilcoxon Rank Sum Test
-
7/27/2019 Statistics in Clinical Research for Residents
46/74
Use of non-parametric tests Use for categorical, ordinal or non-normally
distributed continuous data
May check both parametric and non-parametric tests to check for congruity
Most non-parametric tests are based on ranksor other non- value related methods
Interpretation: Is the P value significant?
-
7/27/2019 Statistics in Clinical Research for Residents
47/74
(Pearsons) Chi-Squared (
2) Test Used to compare observed proportions of an
event compared to expected.
Used with nominal data (better/ worse;dead/alive)
If there is a substantial difference between
observed and expected, then it is likely that
the null hypothesis is rejected.
Often presented graphically as a 2 X 2 Table
-
7/27/2019 Statistics in Clinical Research for Residents
48/74
Chi-Squared (
2) Test Chi-Squared ( 2) Formula
Not applicable in small samples If fewer than 5 observations per cell, use
Fishers exact test
-
7/27/2019 Statistics in Clinical Research for Residents
49/74
CorrelationAssesses the linear relationship between two variables
Example: height and weight
Strength of the association is described by a correlationcoefficient- r
r= 0 - .2 low, probably meaningless
r = .2 - .4 low, possible importance
r = .4 - .6 moderate correlation
r = .6 - .8 high correlation
r = .8 - 1 very high correlation
Can be positive or negative
Pearsons, Spearman correlation coefficient
Tells nothing about causation
-
7/27/2019 Statistics in Clinical Research for Residents
50/74
-
7/27/2019 Statistics in Clinical Research for Residents
51/74
Correlation
Source: Harris and Taylor. Medical Statistics Made Easy
-
7/27/2019 Statistics in Clinical Research for Residents
52/74
Correlation
Perfect Correlation
Source: Altman. Practical Statistics for Medical Research
-
7/27/2019 Statistics in Clinical Research for Residents
53/74
Correlation
Source: Altman. Practical Statistics for Medical Research
Correlation Coefficient 0 Correlation Coefficient .3
-
7/27/2019 Statistics in Clinical Research for Residents
54/74
Correlation
Source: Altman. Practical Statistics for Medical Research
Correlation Coefficient -.5 Correlation Coefficient .7
-
7/27/2019 Statistics in Clinical Research for Residents
55/74
Regression
Based on fitting a line to data Provides a regression coefficient, which is the slope of the line
Y = ax + b
Use to predict a dependent variables value based on thevalue of an independent variable.
Very helpful- In analysis of height and weight, for a knownheight, one can predict weight.
Much more useful than correlationAllows prediction of values of Y rather than just whether there
is a relationship between two variable.
-
7/27/2019 Statistics in Clinical Research for Residents
56/74
Regression
Types of regression
Linear- uses continuous data to predict continuous
data outcome
Logistic- uses continuous data to predict probability
of a dichotomous outcome
Poisson regression- time between rare events.
Cox proportional hazards regression- survivalanalysis.
-
7/27/2019 Statistics in Clinical Research for Residents
57/74
Multiple Regression Models
Determining the association between two
variables while controlling for the values of
others. Example: Uterine Fibroids
Both age and race impact the incidence of fibroids.
Multiple regression allows one to test the impact of
age on the incidence while controlling for race (andall other factors)
-
7/27/2019 Statistics in Clinical Research for Residents
58/74
Multiple Regression Models
In published papers, the multivariable models aremore powerful than univariable models and takeprecedence.
Therefore we discount the univariable model as it does notcontrol for confounding variables.
Eg: Coronary disease is potentially affected by age, HTN,smoking status, gender and many other factors.
If assessing whether height is a factor:
If it is significant on univariable analysis, but not on multivariable
analysis, these other factors confounded the analysis.
-
7/27/2019 Statistics in Clinical Research for Residents
59/74
Risk Ratios
Risk is the probability that an event will happen. Number of events divided by the number of people at risk.
Risks are compared by creating a ratio Example: risk of colon cancer in those exposed to a factor vs
those unexposed
Risk of colon cancer in exposed divided by the risk in thoseunexposed.
-
7/27/2019 Statistics in Clinical Research for Residents
60/74
Risk Ratios
Typically used in cohort studies Prospective observational studies comparing
groups with various exposures.
Allows exploration of the probability thatcertain factors are associated with outcomesof interest For example: association of smoking with lung
cancer Usually require large and long-term studies to
determine risks and risk ratios.
-
7/27/2019 Statistics in Clinical Research for Residents
61/74
Interpreting Risk Ratios
A risk ratio of 1 equals no increased risk
A risk ratio of greater than 1 indicates increased risk
A risk ratio of less than 1 indicates decreased risk
95% confidence intervals are usually presented Must not include 1 for the estimate to be statistically significant.
Example: Risk ratio of 3.1 (95% CI 0.97- 9.41) includes 1, thuswould not be statistically significant.
-
7/27/2019 Statistics in Clinical Research for Residents
62/74
Odds Ratios
Odds of an event occurring divided by
the odds of the event not occurring.
Odds are calculated by the number of timesan event happens by the number of times it
does not happen.
Odds of heads vs the odds of tails is 1:1 or 1.
-
7/27/2019 Statistics in Clinical Research for Residents
63/74
Odds Ratios
Are calculated from case control studies
Case control: patients with a condition (often rare) are comparedto a group of selected controls for exposure to one or morepotential etiologic factors.
Cannot calculate risk from these studies as that requires theobservation of the natural occurrence of an event over time inexposed and unexposed patients (prospective cohort study).
Instead we can calculate the odds for each group.
-
7/27/2019 Statistics in Clinical Research for Residents
64/74
Comparing Risk and Odds Ratios
For rare events, ratios very similar
If 5 of 100 people have a complication: The odds are 5/95 or .0526.
The risk is 5/100 or .05.
If more common events, ratios begin to differ
If 30 of 100 people have a complication: The odds are 30/70 or .43
The risk is 30/100 or .30
Very common events, ratios very different
Male versus female births The odds are .5/.5 or 1
The risk is .5/1 or .5
-
7/27/2019 Statistics in Clinical Research for Residents
65/74
Risk reduction
Absolute risk reduction: amount that the risk isreduced.
Relative risk reduction: proportion or percentage
reduction. Example:
Death rate without treatment: 10 per 1000
Death rate with treatment: 5 per 1000
ARR = 5 per 1000
RRR = 50%
-
7/27/2019 Statistics in Clinical Research for Residents
66/74
Survival Analysis
Evaluation of time to an event (death,recurrence, recover).
Provides means of handling censored data Patients who do not reach the event by the end of
the study or who are lost to follow-up
Most common type is Kaplan-Meier analysis Curves presented as stepwise change from
baseline There are no fixed intervals of follow-up- survival
proportion recalculated after each event.
-
7/27/2019 Statistics in Clinical Research for Residents
67/74
Survival Analysis
Source: Altman. Practical Statistics for Medical Research
-
7/27/2019 Statistics in Clinical Research for Residents
68/74
Kaplan-Meier Curve
Source: Wikipedia
-
7/27/2019 Statistics in Clinical Research for Residents
69/74
Kaplan-Meier Analysis
Provides a graphical means of comparing the
outcomes of two groups that vary by intervention or
other factor.
Survival rates can be measured directly from curve.
Difference between curves can be tested for statistical
significance.
-
7/27/2019 Statistics in Clinical Research for Residents
70/74
Cox Regression Model
AKA: Proportional Hazards Survival Model.
Used to investigate relationship between an event
(death, recurrence) occurring over time and possible
explanatory factors.
Reported result: Hazard ratio (HR).
Ratio of the hazard in one group divided the hazard in
another.
Interpreted same as risk ratios and odds ratios HR 1 = no effect
HR > 1 increased risk
HR < 1 decreased risk
-
7/27/2019 Statistics in Clinical Research for Residents
71/74
-
7/27/2019 Statistics in Clinical Research for Residents
72/74
Maksud lu??
-
7/27/2019 Statistics in Clinical Research for Residents
73/74
Summary
Understanding basic statistical concepts is central to
understanding the medical literature.
Not important to understand the basis of the tests or
the underlying math.
Need to know when a test should be used and how to
interpret its results
-
7/27/2019 Statistics in Clinical Research for Residents
74/74