Factorial ANOVA More than one categorical explanatory variable.
Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we...
Transcript of Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we...
Statistical Methods in Clinical TrialsCategorical Data
Types of Data
Continuous
Blood pressure
Time to event
Ordered
Categorical
Pain level
Discrete
No of relapses
Categorical
sex
quantitative qualitative
Types of data analysis (Inference)
Parametric
Vs
Non parametric
Frequentist
Vs
Bayesian
Model based
Vs
Data driven
Categorical data
• In a RCT, endpoints and surrogate endpoints can be categorical or
ordered categorical variables. In the simplest cases we have binary
responses (e.g. responders non-responders). In Outcomes research
it is common to use many ordered categories (no improvement,
moderate improvement, high improvement).
• Example: Binary outcomes:
– Remission
– Mortality
– Presence/absence of an AE
– Responder/non-responder according to some pre-defined criteria
– Success/Failure
Two proportions
• Sometimes, we want to compare the proportion of successes in two separate groups. For this purpose we take two samples of sizes n1 and n2. We let yi1 and pi1 be the observed number of subjects and the proportion of successes in the ith group. The difference in population proportions of successes and its large sample variance can be estimated by
Two proportions (continued)
• Assume we want to test the null hypothesis that there is no difference between the proportions of success in the two groups. Under the null hypothesis, we can estimate the common proportion by
• Its large sample variance is estimated by
• Leading to the test statistic
Example
NINDS trial in acute ischemic stroke
Treatment n responders*
rt-PA 312 147 (47.1%)
placebo 312 122 (39.1%)*early improvement defined on a neurological scale
Point estimate: 0.080 (s.e.=0.0397)
95% CI: (0.003 ; 0.158)
p-value: 0.043
Two proportions (Chi square)• The problem of comparing two proportions can sometimes
be formulated as a problem of independence!
• Assume we have two groups as above (treatment and placebo). Assume further that the subjects were randomized to these groups.
• We can then test for independence between belonging to a certain group and the clinical endpoint (success or failure).
• The data can be organized in the form of a contingency table in which the marginal totals and the total number of subjects are considered as fixed.
Failure Success Total
Drug 165 147 312
Placebo 190 122 312
Total 355 462 N=624
R E S P O N S E
T
R
E
A
T
M
E
N
T
2 x 2 Contingency table
Hyper geometric distribution
Urn containing W white balls and
R red balls: N=W+R
•n balls are drawn at random without
replacement.
•Y is the number of white balls
(successes)
•Y follows the Hyper geometric
Distribution with parameters (N, W, n)
Contingency tables
• N subjects in total
• y.1 of these are special (success)
• y1. are drawn at random
• Y11 no of successes among these y1.
• Y11 is HG(N,y.1,y 1.)
in general
Contingency tables
• The null hypothesis of independence is
tested using the (Pearson) chi square
statistic
• Which, under the null hypothesis, is chi
square distributed with one degree of
freedom provided the sample sizes in the
two groups are large (over 30) and the
expected frequency in each cell is non
negligible (over 5)
Contingency tables• For moderate sample sizes we use Fisher’s exact
test. According to this calculate the desired probabilities using the exact Hyper-geometric distribution. The variance can then be calculated. To illustrate consider:
• Using this and expectation m11 we have the
randomization chi square statistic. With fixed
margins only one cell is allowed to vary.
Randomization is crucial for this approach.
The (Pearson) Chi-square test
The test-statistic is:
i j
ij
2
ijij2
E
)E(O
where yij = observed frequencies
and mij = expected frequencies (under independence)
the test-statistic approximately follows a chi-square
distribution
p
Example 5
Chi-square test for a 22 tableExamining the independence between two treatments and a
classification into responder/non-responder is equivalent to
comparing the proportion of responders in the two groups
NINDS again non-resp responder
rt-PA 165 147 312
placebo 190 122 312
355 269
Observed frequencies
non-resp responder
rt-PA 177.5 134.5 312
placebo 177.5 134.5 312
355 269
Expected frequencies
TABLE OF GRP BY Y
Frequency‚
Row Pct ‚nonresp ‚resp ‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
placebo ‚ 190 ‚ 122 ‚ 312
‚ 60.90 ‚ 39.10 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
rt-PA ‚ 165 ‚ 147 ‚ 312
‚ 52.88 ‚ 47.12 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 355 269 624
STATISTICS FOR TABLE OF GRP BY Y
Statistic DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 1 4.084 0.043
Likelihood Ratio Chi-Square 1 4.089 0.043
Continuity Adj. Chi-Square 1 3.764 0.052
Mantel-Haenszel Chi-Square 1 4.077 0.043
Fisher's Exact Test (Left) 0.982
(Right) 0.026
(2-Tail) 0.052
Phi Coefficient 0.081
Contingency Coefficient 0.081
Cramer's V 0.081
Sample Size = 624
S
A
S
|
o
u
t
p
u
t
Odds, Odds Ratios and relative Risks
The odds of success in group i is estimated by
The odds ratio of success between the two groups i is estimated by
Define risk for success in the ith group as the proportion of cases with success. The relative risk between the two groups is estimated by
Absolute Risk = AR = p11 – p21
Categorical data
• Nominal
– E.g. patient residence at end of follow-up
(hospital, nursing home, own home, etc.)
• Ordinal (ordered)
– E.g. some global rating• Normal, not at all ill• Borderline mentally ill
• Mildly ill
• Moderately ill
• Markedly ill
• Severely ill
• Among the most extremely ill patients
Categorical data & Chi-square testOther factor
A B C D E
i niA niB niC niD niE ni
One Factor ii niiA niiB niiC niiD niiE nii
iii niiiA niiiB niiiC niiiD niiiE niii
nA nB nC nD nE niA
The chi-square test is useful for detection of a general
association between treatment and categorical response
(in either the nominal or ordinal scale), but it cannot identify
a particular relationship, e.g. a location shift.
Nominal categorical data
Disease category
dip snip fup bop other
treatment A 33 15 34 26 8 116
group B 28 18 34 20 14 114
61 33 68 46 22 230
Chi-square test: 2 = 3.084 , df=4 , p = 0.544
Ordered categorical data• Here we assume two groups one receiving the
drug and one placebo. The response is assumed
to be ordered categorical with J categories.
• The null hypothesis is that the distribution of
subjects in response categories is the same for
both groups.
• Again the randomization and the HG distribution
lead to the same chi square test statistic but this
time with (J-1) df. Moreover the same relationship
exists between the two versions of the chi square
statistic.
The Mantel-Haensel statistic
The aim here is to combine data from several (H) strata for comparing two groups drug and placebo. The expected frequency and the variance for each stratum are used to define the Mantel-Haensel statistic
which is chi square
distributed with
one df.
Logistic regression
• Logistic regression is part of a category of statistical models called generalized linear models (GLM). This broad class of models includes ordinary regression and ANOVA, as well as multivariate statistics such as ANCOVA and loglinear regression. An excellent treatment of generalized linear models is presented in Agresti (1996).
• Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. Generally, the dependent or response variable is dichotomous, such as presence/absence or success/failure.
Multiple logistic regression
• More than one independent variable– Dichotomous, ordinal, nominal, continuous …
• Interpretation of bi
– Increase in log-odds for a one unit increase in xi with all the other xis constant
– Measures association between xi and log-odds adjusted for all other xi
ii x ... β x β xβα-P
P
2211
1ln
Fitting equation to the data
• Linear regression: Least squares or
Maximum likelihood
• Logistic regression: Maximum likelihood
• Likelihood function
– Estimates parameters b
– Practically easier to work with log-likelihood
Statistical testing
• Question
– Does model including given independent
variable provide more information about
dependent variable than model without this
variable?
• Three tests
– Likelihood ratio statistic (LRS)
– Wald test
– Score test
Likelihood ratio statistic
• Compares two nested models
Log(odds) = + b1x1 + b2x2 + b3x3 (model 1)
Log(odds) = + b1x1 + b2x2 (model 2)
• LR statistic
-2 log (likelihood model 2 / likelihood model 1) =
-2 log (likelihood model 2) minus -2log (likelihood
model 1)
LR statistic is a 2 with DF = number of extra
parameters in model
Example 6
Fitting a Logistic regression model to the
NINDS data, using only one covariate
(treatment group).
NINDS again non-resp responder
rt-PA 165 147 312
placebo 190 122 312
355 269
Observed frequencies
S
A
S
|
o
u
t
p
u
t
The LOGISTIC Procedure Response Profile Ordered Binary Value Outcome Count 1 EVENT 269 2 NO EVENT 355 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 855.157 853.069 . SC 859.593 861.941 . -2 LOG L 853.157 849.069 4.089 with 1 DF (p=0.0432) Score . . 4.084 with 1 DF (p=0.0433) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -0.4430 0.1160 14.5805 0.0001 . . GRP 1 0.3275 0.1622 4.0743 0.0435 0.090350 1.387
David Brennan CEO of AstraZeneca
?
4 measures of association
(effect)– Quite often we are interested in risk and
probabilty only as a way to measure association or effect:
cure is associated with drug = the drug has an effect
– This can be done in different ways1. Relative Risk (Prospective Studies)
2. Odds Ratio (Prospective or Retrospective)
3. Absolute Risk (Prospective Studies)
4. (Number Needed toTreat) (Prospective Studies)
Absolute Risk
• Difference Between Proportions of outcomes in 2 groups 1 and 2. Estimated absolute risk
• 95% Confidence Interval for Population Absolute Risk
.2
212
^
.1
111
^
n
n
n
n
16.064.080.02
^
1
^
AR
%95 )205.0 ,115.0(
11
96.1.2
2
^
2
^
.1
1
^
1
^
nn
AR
nAssociatio - 0
nassociatio No 0
nAssociatio 0
AR
20005541446Total
354646Standard drug
New drug
Group
Total
Not curedCured
1000
800 200 1000
Number Neede to Treat
NNT
• Assume n subjects take one treatment and n subjects take a second
treatment. Let X1 and X2 be the number of successful treatments in
the two cases and p1 and p2 denote the probabilities of sucess in the
two groups. Assume further that we can use the binomial
distribution. Then the average difference between the two groups
and the number needed to treat can be calculated according to
)(
11)(
),(][][
,][),(
21
21
2121
ppnppn
ppnXEXE
npXEpnBinX iiii
Number needed to treat
• Definition: The number needed to be treated to prevent 1 event is
calculated as the inverse of the absolute risk difference:
• NNT is frequently used in clinical trials to provide an insight into the
clinical relevance of the effect of treatment under investigation. It is
often claimed that its popularity depends on its simplicity and
intuitive interpretation.
25.616.0
111
2
^
1
^
ARNNT
20005541446Total
354646Standard
drug
New
drug
Group
Total
Not
cured
Cured
1000
800 200 1000
16.064.080.02
^
1
^
AR
Issues with NNT
• NNT should be completed with follow up period and unfavourable event avoided.
• NNT presupposes that there is statistically significant difference (*).
• How much NNT is good? No magic figure: (10-500) risky surgerey – standard inexpensive drug with no side effect – active treatment – preventive treatment etc.
• Statistical properties? Confidence intervals?
• When AR = 0, NNT becomes infinite!
• The distribution of NNT is complicated because its behavior around AR = 0;
• The moments of NNT do not exist;
• Simple calculations with NNT like can give nonsensical results.
2
^
1ˆˆ
11
AR
NNT
Example 8
• In a study it was reported that the absolute risk reduction for patients
with moderate baseline stroke severity as being 16.6%. The number
needed to treat is thus 1/0.166 or approximately 6. This benefit was
statistically significant: the 95% confidence interval for the absolute
risk reduction was [0.9%, 32.2%]. A 95% confidence interval for the
number needed to treat is [1/0.009 , 1/0.322] or approximately [3.1 ,
111.1].
• This all seems quite straightforward, but what if we try the
calculation for a non-significant result, for example, for patients with
low baseline stroke severity. The absolute risk reduction was 6.6%
with a 95% confidence interval of [–20.9% , 34.1%]. Naively taking
reciprocals gives a number needed to treat of about 15.2 and an
apparent 95% confidence interval of [-4.8 , 2.9], which does not
seem to include 15.2! Clearly something’s wrong.
To understand the source of the confusion, note first that the
lower limit of the confidence interval for the absolute risk
reduction is negative, because the data do not rule out the
possibility that the treatment is actually harmful for this group of
patients. The reciprocal of this lower limit is –4.8, or a “number
needed to harm” of 4.8.
A better description of positive and negative values of the
number needed to treat would be the “number needed to treat
for one additional patient to benefit (or be harmed),” or NNTB
and NNTH respectively. The 95% confidence interval for the
absolute risk reduction thus extends from a NNTH of 4.8 at one
extreme to a NNTB of 2.9 at the other.
• To understand what such a confidence interval covers, imagine for a
moment that the absolute risk reduction had only just been significant,
with a confidence interval extending from slightly more than 0% to
34.1%.
• The confidence interval for the number needed to treat would now
extend from 2.9 to something approaching infinity.
• This would indicate that, according to the data, for one additional
patient to benefit, a clinician would need to treat at least 2.9 patients
(the reciprocal of 34.1%), but perhaps an extremely large number of
patients.
• Thus, when a confidence interval for an absolute risk reduction
overlaps zero, the corresponding confidence interval for the number
needed to treat includes infinity.
• This explains the confusion in the case of the patients with low baseline
stroke severity: the 95% confidence interval does, after all, contain the
point estimate (see fig. below).
The estimated number needed to treat and its confidence interval can
be quoted as NNTB = 15.2 (95% confidence interval NNTH 4.8 to
to NNTB 2.9).
Confidence intervals for absolute risk reduction and number needed to treat
for benefit (NNTB) or harm (NNTH) for patients with low baseline stroke
severity.
[–20.9% ,
34.1%]),,9.2[),4[
• In other words, for this group of patients, it could be that,
on average, treating as few as 3 patients would result in
one additional patient benefiting. On the other hand, it
could be that, on average, treating as few as 5 patients
would result in one additional patient being harmed.
• It is important that a nonsignificant number needed to
treat has a confidence interval with 2 parts, one allowing
for the possibility that the treatment is actually harmful,
and the other for the possibility that the treatment is
beneficial.
Maximum likelihood
• The invariance property for ML estimators cannot apply here for the
following reason: For a one dimensional parameter q a function of this
parameter t(q) must have a single valued inverse in order to have
• Bimodality and the range of definition make convergence to normality
difficult to achieve (slow) for small sample sizes.
)ˆ()(ˆ qtqt
Invariance Property of MLE’s
If θ is the MLE of some parameter θ and t(.) is a one-to-one function,
then h(ˆθ) is the MLE of .
q̂)(ˆ qt )(qt
Unbiasedeness
• Unbiasedness is a matter of scale: if q is unbiased for q then t(q ) will
be biased for t(q) unless t is the identity function.
• Moreover the singularity at 0 implies that NNT cannot be bias
corrected. Attempts to improve the behaviour of the estimator by
reducing the bias will fail.
q̂ q̂
Testing
• No simple test of no treatment effect can be constructed for the
supposedly ”simple” and comprehensible NNT. This is because this
corresponds to a value of for the parameter (a z-statistic of the
form .SE/)ˆ( q
Generalized Mixed Effects Models
48 Date
Various forms of models and relation between them
Date
Name, department
49
LM: Assumptions:
1. independence,
2. normality,
3. constant parameters
GLM: assumption 2)
Exponential family
LMM:
Assumptions 1)
and 3) are modified
GLMM: Assumption 2) Exponential
family and assumptions 1) and 3) are
modified
Repeated measures:
Assumptions 1) and 3)
are modified
Longitudinal dataMaximum likelihood
Classical statistics (Observations are random, parameters are unknown constants)
Bayesian statistics
LM - Linear model
GLM - Generalised linear model
LMM - Linear mixed model
GLMM - Generalised linear mixed model
Non-linear models
Exponential families
Exponential family comprises a set of flexible distribution
ranging both continuous and discrete random variables.
The members of this family have many important properties
which merits discussing them in some general format.
Many of the usual probability distributions are specific
members of this family:
Gaussian – Bernoull – Binomial - Von mises - Gamma –
Poisson – Exponential - Beta: (0; 1) – Weibull etc
Generalized linear Models:
Date
Name, department
51
Date
Name, department
52
The Bernoulli distribution
Generalized Linear Models
Date
Name, department
53
Generalised Linear Mixed Models
Date
Name, department
54
Date
Name, department
55
Date
Name, department
56
Empirical
Bayes
estimates
Date
Name, department
57
Example 1 (cont’d)
Date
Name, department
58
Date
Name, department
59
A Bayesian alternative
Infection vs. poverty
• Some studies from the year 1990 suggested that the risk to CHD is
associated with childhood poverty. Since infection with the
bacterium H. Pylori is also linked to poverty, some researchers
suspected H. Pylori to be the missing link. In a study where levels of
infections were considered in patients and controls, the following
results were obtained.
• Using the data below, the chi square statistic having, the value 4.37
yields a p-value of 0.03 which is less than the formal level of
significance 0.05.
CHD Healthy
Control
High 60% 39%
Low 40% 61%
Let us try a bayesian alternative: Since we have no theoretical reason to
believe that the above result is true, we take P(H0)=0.5.
1BF
BF
BF
1BF
BF2
12
1
1 D] | P[H
1
1
0
Berger and Selke (1987) have shown that for a very wide range of
cases including this one
2
1
2
2
BF
e
Using the value 4.73 for the chi square variable leads to a BF
value of at least 0.337
Reference: M. A. Mendall et al Relation betweenH. Pylori infection and coronary
heart disease. Heart J. (1994)).
Conclusion
252.01337.0
0.337 D] | P[H0
Taking other (more or less sceptical)
attitude does not change a the conclusion
that much:
P(H0)=0.75 => P[ H0| D] > (0.5)
P(H0)=0.25 => P[ H0| D] > (0.1)
.2,1 ,)1/()(]|[
1
)1( and
2
1
2
21
inxDpE
n
iiiiiii
i iii
iipp
b
b
Bayesian properties of NNT
• Let D = (x1, x2, n1, n2) represent data from some trial. Assuming
independent Beta(αi, βi ) prior distributions for the pi leads to the joint
posterior distribution of (p1, p2) as a product of independent Beta
distributions. Apart from mathematical tractability, beta priors offer
great flexibility of distributional shape.
• One can obtain the posterior distribution of the difference p=(p1 -p2)or that of NNT = 1/p by simple transformation, and using Markov
chain Monte Carlo (MCMC) to simulate directly from the posterior
distributions. The posterior mean μp and variance of p are
respectively given by
2
p
• Asymptotically, p will have a Normal posterior distribution with mean μp and variance . The common practice is to estimate NNT by 1/μp and the corresponding interval estimate is given by the 95% credible interval
• Making the transformation to y = 1/p = NNT, we find that the asymptotic distribution of Y is given by
• This density is known as the inverse normal distribution (Johnson et al., 1995, p. 171). It is a special case of the generalized inverse normal family of density functions considered by Robert (1991). The mean and variance of this distribution do not exist.
2
p
1)96.1( pp
2
2
2 2
)1
(
exp2
1)|(
p
p
p
y
yDyf
• However, the distribution has two modes at
• Thus the point estimate of NNT would be given by NNT2 when
there is efficacy and by NNT1 when the control treatment
dominates the experimental. The figure below shows graphs of
for different values of μp and σp. We observe from the figure that
the pdf based on μp < 0 is a mirror image of that of μp > 0.
2
22
22
22
14
8ˆ and ,
4
8ˆ
p
ppp
p
pppTNNTNN