Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we...

67
Statistical Methods in Clinical Trials Categorical Data

Transcript of Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we...

Page 1: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Statistical Methods in Clinical TrialsCategorical Data

Page 2: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Types of Data

Continuous

Blood pressure

Time to event

Ordered

Categorical

Pain level

Discrete

No of relapses

Categorical

sex

quantitative qualitative

Page 3: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Types of data analysis (Inference)

Parametric

Vs

Non parametric

Frequentist

Vs

Bayesian

Model based

Vs

Data driven

Page 4: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Categorical data

• In a RCT, endpoints and surrogate endpoints can be categorical or

ordered categorical variables. In the simplest cases we have binary

responses (e.g. responders non-responders). In Outcomes research

it is common to use many ordered categories (no improvement,

moderate improvement, high improvement).

• Example: Binary outcomes:

– Remission

– Mortality

– Presence/absence of an AE

– Responder/non-responder according to some pre-defined criteria

– Success/Failure

Page 5: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Two proportions

• Sometimes, we want to compare the proportion of successes in two separate groups. For this purpose we take two samples of sizes n1 and n2. We let yi1 and pi1 be the observed number of subjects and the proportion of successes in the ith group. The difference in population proportions of successes and its large sample variance can be estimated by

Page 6: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Two proportions (continued)

• Assume we want to test the null hypothesis that there is no difference between the proportions of success in the two groups. Under the null hypothesis, we can estimate the common proportion by

• Its large sample variance is estimated by

• Leading to the test statistic

Page 7: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Example

NINDS trial in acute ischemic stroke

Treatment n responders*

rt-PA 312 147 (47.1%)

placebo 312 122 (39.1%)*early improvement defined on a neurological scale

Point estimate: 0.080 (s.e.=0.0397)

95% CI: (0.003 ; 0.158)

p-value: 0.043

Page 8: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Two proportions (Chi square)• The problem of comparing two proportions can sometimes

be formulated as a problem of independence!

• Assume we have two groups as above (treatment and placebo). Assume further that the subjects were randomized to these groups.

• We can then test for independence between belonging to a certain group and the clinical endpoint (success or failure).

• The data can be organized in the form of a contingency table in which the marginal totals and the total number of subjects are considered as fixed.

Page 9: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Failure Success Total

Drug 165 147 312

Placebo 190 122 312

Total 355 462 N=624

R E S P O N S E

T

R

E

A

T

M

E

N

T

2 x 2 Contingency table

Page 10: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Hyper geometric distribution

Urn containing W white balls and

R red balls: N=W+R

•n balls are drawn at random without

replacement.

•Y is the number of white balls

(successes)

•Y follows the Hyper geometric

Distribution with parameters (N, W, n)

Page 11: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Contingency tables

• N subjects in total

• y.1 of these are special (success)

• y1. are drawn at random

• Y11 no of successes among these y1.

• Y11 is HG(N,y.1,y 1.)

in general

Page 12: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Contingency tables

• The null hypothesis of independence is

tested using the (Pearson) chi square

statistic

• Which, under the null hypothesis, is chi

square distributed with one degree of

freedom provided the sample sizes in the

two groups are large (over 30) and the

expected frequency in each cell is non

negligible (over 5)

Page 13: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Contingency tables• For moderate sample sizes we use Fisher’s exact

test. According to this calculate the desired probabilities using the exact Hyper-geometric distribution. The variance can then be calculated. To illustrate consider:

• Using this and expectation m11 we have the

randomization chi square statistic. With fixed

margins only one cell is allowed to vary.

Randomization is crucial for this approach.

Page 14: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

The (Pearson) Chi-square test

The test-statistic is:

i j

ij

2

ijij2

E

)E(O

where yij = observed frequencies

and mij = expected frequencies (under independence)

the test-statistic approximately follows a chi-square

distribution

p

Page 15: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Example 5

Chi-square test for a 22 tableExamining the independence between two treatments and a

classification into responder/non-responder is equivalent to

comparing the proportion of responders in the two groups

NINDS again non-resp responder

rt-PA 165 147 312

placebo 190 122 312

355 269

Observed frequencies

non-resp responder

rt-PA 177.5 134.5 312

placebo 177.5 134.5 312

355 269

Expected frequencies

Page 16: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

TABLE OF GRP BY Y

Frequency‚

Row Pct ‚nonresp ‚resp ‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

placebo ‚ 190 ‚ 122 ‚ 312

‚ 60.90 ‚ 39.10 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

rt-PA ‚ 165 ‚ 147 ‚ 312

‚ 52.88 ‚ 47.12 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 355 269 624

STATISTICS FOR TABLE OF GRP BY Y

Statistic DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chi-Square 1 4.084 0.043

Likelihood Ratio Chi-Square 1 4.089 0.043

Continuity Adj. Chi-Square 1 3.764 0.052

Mantel-Haenszel Chi-Square 1 4.077 0.043

Fisher's Exact Test (Left) 0.982

(Right) 0.026

(2-Tail) 0.052

Phi Coefficient 0.081

Contingency Coefficient 0.081

Cramer's V 0.081

Sample Size = 624

S

A

S

|

o

u

t

p

u

t

Page 17: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Odds, Odds Ratios and relative Risks

The odds of success in group i is estimated by

The odds ratio of success between the two groups i is estimated by

Define risk for success in the ith group as the proportion of cases with success. The relative risk between the two groups is estimated by

Absolute Risk = AR = p11 – p21

Page 18: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Categorical data

• Nominal

– E.g. patient residence at end of follow-up

(hospital, nursing home, own home, etc.)

• Ordinal (ordered)

– E.g. some global rating• Normal, not at all ill• Borderline mentally ill

• Mildly ill

• Moderately ill

• Markedly ill

• Severely ill

• Among the most extremely ill patients

Page 19: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Categorical data & Chi-square testOther factor

A B C D E

i niA niB niC niD niE ni

One Factor ii niiA niiB niiC niiD niiE nii

iii niiiA niiiB niiiC niiiD niiiE niii

nA nB nC nD nE niA

The chi-square test is useful for detection of a general

association between treatment and categorical response

(in either the nominal or ordinal scale), but it cannot identify

a particular relationship, e.g. a location shift.

Page 20: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Nominal categorical data

Disease category

dip snip fup bop other

treatment A 33 15 34 26 8 116

group B 28 18 34 20 14 114

61 33 68 46 22 230

Chi-square test: 2 = 3.084 , df=4 , p = 0.544

Page 21: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Ordered categorical data• Here we assume two groups one receiving the

drug and one placebo. The response is assumed

to be ordered categorical with J categories.

• The null hypothesis is that the distribution of

subjects in response categories is the same for

both groups.

• Again the randomization and the HG distribution

lead to the same chi square test statistic but this

time with (J-1) df. Moreover the same relationship

exists between the two versions of the chi square

statistic.

Page 22: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

The Mantel-Haensel statistic

The aim here is to combine data from several (H) strata for comparing two groups drug and placebo. The expected frequency and the variance for each stratum are used to define the Mantel-Haensel statistic

which is chi square

distributed with

one df.

Page 23: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Logistic regression

• Logistic regression is part of a category of statistical models called generalized linear models (GLM). This broad class of models includes ordinary regression and ANOVA, as well as multivariate statistics such as ANCOVA and loglinear regression. An excellent treatment of generalized linear models is presented in Agresti (1996).

• Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. Generally, the dependent or response variable is dichotomous, such as presence/absence or success/failure.

Page 24: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Multiple logistic regression

• More than one independent variable– Dichotomous, ordinal, nominal, continuous …

• Interpretation of bi

– Increase in log-odds for a one unit increase in xi with all the other xis constant

– Measures association between xi and log-odds adjusted for all other xi

ii x ... β x β xβα-P

P

2211

1ln

Page 25: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Fitting equation to the data

• Linear regression: Least squares or

Maximum likelihood

• Logistic regression: Maximum likelihood

• Likelihood function

– Estimates parameters b

– Practically easier to work with log-likelihood

Page 26: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Statistical testing

• Question

– Does model including given independent

variable provide more information about

dependent variable than model without this

variable?

• Three tests

– Likelihood ratio statistic (LRS)

– Wald test

– Score test

Page 27: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Likelihood ratio statistic

• Compares two nested models

Log(odds) = + b1x1 + b2x2 + b3x3 (model 1)

Log(odds) = + b1x1 + b2x2 (model 2)

• LR statistic

-2 log (likelihood model 2 / likelihood model 1) =

-2 log (likelihood model 2) minus -2log (likelihood

model 1)

LR statistic is a 2 with DF = number of extra

parameters in model

Page 28: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Example 6

Fitting a Logistic regression model to the

NINDS data, using only one covariate

(treatment group).

NINDS again non-resp responder

rt-PA 165 147 312

placebo 190 122 312

355 269

Observed frequencies

Page 29: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

S

A

S

|

o

u

t

p

u

t

The LOGISTIC Procedure Response Profile Ordered Binary Value Outcome Count 1 EVENT 269 2 NO EVENT 355 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 855.157 853.069 . SC 859.593 861.941 . -2 LOG L 853.157 849.069 4.089 with 1 DF (p=0.0432) Score . . 4.084 with 1 DF (p=0.0433) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -0.4430 0.1160 14.5805 0.0001 . . GRP 1 0.3275 0.1622 4.0743 0.0435 0.090350 1.387

Page 30: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to
Page 32: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

David Brennan CEO of AstraZeneca

Page 33: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

?

Page 34: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

4 measures of association

(effect)– Quite often we are interested in risk and

probabilty only as a way to measure association or effect:

cure is associated with drug = the drug has an effect

– This can be done in different ways1. Relative Risk (Prospective Studies)

2. Odds Ratio (Prospective or Retrospective)

3. Absolute Risk (Prospective Studies)

4. (Number Needed toTreat) (Prospective Studies)

Page 35: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Absolute Risk

• Difference Between Proportions of outcomes in 2 groups 1 and 2. Estimated absolute risk

• 95% Confidence Interval for Population Absolute Risk

.2

212

^

.1

111

^

n

n

n

n

16.064.080.02

^

1

^

AR

%95 )205.0 ,115.0(

11

96.1.2

2

^

2

^

.1

1

^

1

^

nn

AR

nAssociatio - 0

nassociatio No 0

nAssociatio 0

AR

20005541446Total

354646Standard drug

New drug

Group

Total

Not curedCured

1000

800 200 1000

Page 36: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Number Neede to Treat

Page 37: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

NNT

• Assume n subjects take one treatment and n subjects take a second

treatment. Let X1 and X2 be the number of successful treatments in

the two cases and p1 and p2 denote the probabilities of sucess in the

two groups. Assume further that we can use the binomial

distribution. Then the average difference between the two groups

and the number needed to treat can be calculated according to

)(

11)(

),(][][

,][),(

21

21

2121

ppnppn

ppnXEXE

npXEpnBinX iiii

Page 38: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Number needed to treat

• Definition: The number needed to be treated to prevent 1 event is

calculated as the inverse of the absolute risk difference:

• NNT is frequently used in clinical trials to provide an insight into the

clinical relevance of the effect of treatment under investigation. It is

often claimed that its popularity depends on its simplicity and

intuitive interpretation.

25.616.0

111

2

^

1

^

ARNNT

20005541446Total

354646Standard

drug

New

drug

Group

Total

Not

cured

Cured

1000

800 200 1000

16.064.080.02

^

1

^

AR

Page 39: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Issues with NNT

• NNT should be completed with follow up period and unfavourable event avoided.

• NNT presupposes that there is statistically significant difference (*).

• How much NNT is good? No magic figure: (10-500) risky surgerey – standard inexpensive drug with no side effect – active treatment – preventive treatment etc.

• Statistical properties? Confidence intervals?

• When AR = 0, NNT becomes infinite!

• The distribution of NNT is complicated because its behavior around AR = 0;

• The moments of NNT do not exist;

• Simple calculations with NNT like can give nonsensical results.

2

^

1ˆˆ

11

AR

NNT

Page 40: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Example 8

• In a study it was reported that the absolute risk reduction for patients

with moderate baseline stroke severity as being 16.6%. The number

needed to treat is thus 1/0.166 or approximately 6. This benefit was

statistically significant: the 95% confidence interval for the absolute

risk reduction was [0.9%, 32.2%]. A 95% confidence interval for the

number needed to treat is [1/0.009 , 1/0.322] or approximately [3.1 ,

111.1].

• This all seems quite straightforward, but what if we try the

calculation for a non-significant result, for example, for patients with

low baseline stroke severity. The absolute risk reduction was 6.6%

with a 95% confidence interval of [–20.9% , 34.1%]. Naively taking

reciprocals gives a number needed to treat of about 15.2 and an

apparent 95% confidence interval of [-4.8 , 2.9], which does not

seem to include 15.2! Clearly something’s wrong.

Page 41: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

To understand the source of the confusion, note first that the

lower limit of the confidence interval for the absolute risk

reduction is negative, because the data do not rule out the

possibility that the treatment is actually harmful for this group of

patients. The reciprocal of this lower limit is –4.8, or a “number

needed to harm” of 4.8.

A better description of positive and negative values of the

number needed to treat would be the “number needed to treat

for one additional patient to benefit (or be harmed),” or NNTB

and NNTH respectively. The 95% confidence interval for the

absolute risk reduction thus extends from a NNTH of 4.8 at one

extreme to a NNTB of 2.9 at the other.

Page 42: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

• To understand what such a confidence interval covers, imagine for a

moment that the absolute risk reduction had only just been significant,

with a confidence interval extending from slightly more than 0% to

34.1%.

• The confidence interval for the number needed to treat would now

extend from 2.9 to something approaching infinity.

• This would indicate that, according to the data, for one additional

patient to benefit, a clinician would need to treat at least 2.9 patients

(the reciprocal of 34.1%), but perhaps an extremely large number of

patients.

• Thus, when a confidence interval for an absolute risk reduction

overlaps zero, the corresponding confidence interval for the number

needed to treat includes infinity.

• This explains the confusion in the case of the patients with low baseline

stroke severity: the 95% confidence interval does, after all, contain the

point estimate (see fig. below).

The estimated number needed to treat and its confidence interval can

be quoted as NNTB = 15.2 (95% confidence interval NNTH 4.8 to

to NNTB 2.9).

Page 43: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Confidence intervals for absolute risk reduction and number needed to treat

for benefit (NNTB) or harm (NNTH) for patients with low baseline stroke

severity.

[–20.9% ,

34.1%]),,9.2[),4[

Page 44: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

• In other words, for this group of patients, it could be that,

on average, treating as few as 3 patients would result in

one additional patient benefiting. On the other hand, it

could be that, on average, treating as few as 5 patients

would result in one additional patient being harmed.

• It is important that a nonsignificant number needed to

treat has a confidence interval with 2 parts, one allowing

for the possibility that the treatment is actually harmful,

and the other for the possibility that the treatment is

beneficial.

Page 45: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Maximum likelihood

• The invariance property for ML estimators cannot apply here for the

following reason: For a one dimensional parameter q a function of this

parameter t(q) must have a single valued inverse in order to have

• Bimodality and the range of definition make convergence to normality

difficult to achieve (slow) for small sample sizes.

)ˆ()(ˆ qtqt

Invariance Property of MLE’s

If θ is the MLE of some parameter θ and t(.) is a one-to-one function,

then h(ˆθ) is the MLE of .

q̂)(ˆ qt )(qt

Page 46: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Unbiasedeness

• Unbiasedness is a matter of scale: if q is unbiased for q then t(q ) will

be biased for t(q) unless t is the identity function.

• Moreover the singularity at 0 implies that NNT cannot be bias

corrected. Attempts to improve the behaviour of the estimator by

reducing the bias will fail.

q̂ q̂

Page 47: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Testing

• No simple test of no treatment effect can be constructed for the

supposedly ”simple” and comprehensible NNT. This is because this

corresponds to a value of for the parameter (a z-statistic of the

form .SE/)ˆ( q

Page 48: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Generalized Mixed Effects Models

48 Date

Page 49: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Various forms of models and relation between them

Date

Name, department

49

LM: Assumptions:

1. independence,

2. normality,

3. constant parameters

GLM: assumption 2)

Exponential family

LMM:

Assumptions 1)

and 3) are modified

GLMM: Assumption 2) Exponential

family and assumptions 1) and 3) are

modified

Repeated measures:

Assumptions 1) and 3)

are modified

Longitudinal dataMaximum likelihood

Classical statistics (Observations are random, parameters are unknown constants)

Bayesian statistics

LM - Linear model

GLM - Generalised linear model

LMM - Linear mixed model

GLMM - Generalised linear mixed model

Non-linear models

Page 50: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Exponential families

Exponential family comprises a set of flexible distribution

ranging both continuous and discrete random variables.

The members of this family have many important properties

which merits discussing them in some general format.

Many of the usual probability distributions are specific

members of this family:

Gaussian – Bernoull – Binomial - Von mises - Gamma –

Poisson – Exponential - Beta: (0; 1) – Weibull etc

Page 51: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Generalized linear Models:

Date

Name, department

51

Page 52: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Date

Name, department

52

The Bernoulli distribution

Page 53: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Generalized Linear Models

Date

Name, department

53

Page 54: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Generalised Linear Mixed Models

Date

Name, department

54

Page 55: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Date

Name, department

55

Page 56: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Date

Name, department

56

Page 57: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Empirical

Bayes

estimates

Date

Name, department

57

Page 58: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Example 1 (cont’d)

Date

Name, department

58

Page 59: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Date

Name, department

59

Page 60: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

A Bayesian alternative

Page 61: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Infection vs. poverty

• Some studies from the year 1990 suggested that the risk to CHD is

associated with childhood poverty. Since infection with the

bacterium H. Pylori is also linked to poverty, some researchers

suspected H. Pylori to be the missing link. In a study where levels of

infections were considered in patients and controls, the following

results were obtained.

• Using the data below, the chi square statistic having, the value 4.37

yields a p-value of 0.03 which is less than the formal level of

significance 0.05.

CHD Healthy

Control

High 60% 39%

Low 40% 61%

Page 62: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Let us try a bayesian alternative: Since we have no theoretical reason to

believe that the above result is true, we take P(H0)=0.5.

1BF

BF

BF

1BF

BF2

12

1

1 D] | P[H

1

1

0

Berger and Selke (1987) have shown that for a very wide range of

cases including this one

2

1

2

2

BF

e

Using the value 4.73 for the chi square variable leads to a BF

value of at least 0.337

Reference: M. A. Mendall et al Relation betweenH. Pylori infection and coronary

heart disease. Heart J. (1994)).

Page 63: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

Conclusion

252.01337.0

0.337 D] | P[H0

Taking other (more or less sceptical)

attitude does not change a the conclusion

that much:

P(H0)=0.75 => P[ H0| D] > (0.5)

P(H0)=0.25 => P[ H0| D] > (0.1)

Page 64: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

.2,1 ,)1/()(]|[

1

)1( and

2

1

2

21

inxDpE

n

iiiiiii

i iii

iipp

b

b

Bayesian properties of NNT

• Let D = (x1, x2, n1, n2) represent data from some trial. Assuming

independent Beta(αi, βi ) prior distributions for the pi leads to the joint

posterior distribution of (p1, p2) as a product of independent Beta

distributions. Apart from mathematical tractability, beta priors offer

great flexibility of distributional shape.

• One can obtain the posterior distribution of the difference p=(p1 -p2)or that of NNT = 1/p by simple transformation, and using Markov

chain Monte Carlo (MCMC) to simulate directly from the posterior

distributions. The posterior mean μp and variance of p are

respectively given by

2

p

Page 65: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

• Asymptotically, p will have a Normal posterior distribution with mean μp and variance . The common practice is to estimate NNT by 1/μp and the corresponding interval estimate is given by the 95% credible interval

• Making the transformation to y = 1/p = NNT, we find that the asymptotic distribution of Y is given by

• This density is known as the inverse normal distribution (Johnson et al., 1995, p. 171). It is a special case of the generalized inverse normal family of density functions considered by Robert (1991). The mean and variance of this distribution do not exist.

2

p

1)96.1( pp

2

2

2 2

)1

(

exp2

1)|(

p

p

p

y

yDyf

Page 66: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to

• However, the distribution has two modes at

• Thus the point estimate of NNT would be given by NNT2 when

there is efficacy and by NNT1 when the control treatment

dominates the experimental. The figure below shows graphs of

for different values of μp and σp. We observe from the figure that

the pdf based on μp < 0 is a mirror image of that of μp > 0.

2

22

22

22

14

8ˆ and ,

4

p

ppp

p

pppTNNTNN

Page 67: Statistical Methods in Clinical Trials Categorical Data · Ordered categorical data • Here we assume two groups one receiving the drug and one placebo. The response is assumed to