Prague 02.10.2008

40
Overview of the statistical analysis Jonas Ranstam, PhD, National Musculoskeletal Competence Centre, Lund, Sweden

Transcript of Prague 02.10.2008

Page 1: Prague 02.10.2008

Overview of the statistical analysis

Jonas Ranstam, PhD,National Musculoskeletal Competence Centre, Lund, Sweden

Page 2: Prague 02.10.2008

Explanations and points of reference

1. Methodological background2. International guidelines3. Multiplicity issues4. Study population definitions5. Statistical models

Page 3: Prague 02.10.2008

1. Methodological background

Page 4: Prague 02.10.2008

Clinical research

Before 1948

Unclear validity, unknown statistical precision

- Prof A's patients better than Prof B's- Small series of patients or even single cases

Page 5: Prague 02.10.2008

Streptomycin in Tuberculosis Trials Committee. Streptomycin treatment of pulmonary tuberculosis. BMJ 1948;2:769-83.

The Control Scheme

Determination of whether a patient would be treated by streptomycin and bed-rest (S case) or by bed-rest alone (C case) was made by reference to a statistical series based on random sampling numbers drawn up for each sex at each centre by Professor Bradford Hill; the details of the series were unknown to any of the investigators or to the co-ordinator and were contained in a set of sealed envelopes, each bearing on the outside only the name of the hospital and a number.

Page 6: Prague 02.10.2008

Clinical research

From 1948

Elimination/reduction of bias, assessment of statistical precision

- Randomization and blinding (intervention studies)- Effect modeling (observation studies)- P-values and confidence intervals

Page 7: Prague 02.10.2008

Quantitative principles I

Randomized allocation of patients to treatment groups (and blinding when possible) guarantee that:

1. All differences between treatment groups at baseline are random (not systematic).

Complete absence of baseline imbalance is not the aim. Stratification on prognostic factors are used to make the groups less imbalanced.

2. Treatment effect estimates are unaffected by selection and confounding bias (and with blinding, differential misclassification bias).

Page 8: Prague 02.10.2008

Quantitative principles II

1. Individual effects vary between subjects.Different samples of subjects will yield

different observed mean effects.

2. The subject variation can be estimatedusing the observations in a random sample.

3. A universal mean effect can be estimated, and the reliability of this estimate can be described with p-values and confidence

intervals.

Page 9: Prague 02.10.2008

P-values are often misunderstood

They do

- describe the reliability of findings. P < 0.05 is usually considered reliable.

They do not

- describe clinical relevance (they depend on sample size).

- show that a difference “does not exist” (“n.s.” is absence of evidence, not evidence of absence).

Page 10: Prague 02.10.2008

2. International guidelines

Page 11: Prague 02.10.2008
Page 12: Prague 02.10.2008

ICMJE – the Vancouver group

Results

“Avoid relying solely on statistical hypothesis testing, such as the use of P values, which fails to convey important information about effect size.”

“When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals).”

Page 13: Prague 02.10.2008

Example: FREE SF36-PCS

Estimated treatment effect difference at baseline

Difference (95%Ci) p-value 0.4 (-1.7 – 2.6) 0.7

Estimated treatment effect difference at 1 month

Difference (95%Ci) p-value 5.9 (3.7 – 8.2) <0.0001

Page 14: Prague 02.10.2008

0Effect Clinically significant effect

Statistically and clinically significant effect

Statistically, but not necessarily clinically, significant effect

Inconclusive

Neither statistically nor clinically significant effect

Statistically significant reversed effect

p < 0.05

p < 0.05

n.s.

n.s.

p < 0.05

P-value Confidence intervals 2 possible outcomes 5 possible outcomes

P-values vs. confidence intervals

Bad Good

Page 15: Prague 02.10.2008

Clinical trialsInternational regulatory guidelines

ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates- missing data- multiplicity issues- etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

Page 16: Prague 02.10.2008

3. Multiplicity issues

Page 17: Prague 02.10.2008

Multiplicity

Multiplicity of inferences is present in almost all trials. If not properly handled, unsubstantiated claims for effectiveness may be made as a consequence of an inflated rate of false positive conclusions.

Page 18: Prague 02.10.2008

Multiplicity

The chance of at least one false positive finding (FPR) = 1 - (1 – α)k

where k is the number of performed comparisons and α the significance level (usually 0.05).

k = 1 => FPR = 0.05k = 2 => FPR = 0.0975k = 10 => FPR = 0.4013

Bonferroni method: divide the significance level by the number of comparisons. This is bad for the statistical power, should be avoided.

Page 19: Prague 02.10.2008

Endpoints

Primary The variable capable of providing themost clinically relevant evidencedirectly related to the primary objectiveof the trial

Secondary Either measurements supporting theprimary endpoint or effects related to

secondary objectives

Page 20: Prague 02.10.2008

Statistical analyses

Confirmatory The result concerns a primary endpoint and the p-value or confidence interval

accounts for potential multiplicity.

The result can support a claim of superiority, equivalence or non-

inferiority.

Exploratory All other analyses.

The result is either supporting or explanatory, or simply just a new hypothesis.

Page 21: Prague 02.10.2008

4. Study population definitions

Page 22: Prague 02.10.2008

Study populations

Intention-to-treat Analyze all randomized subjects(ITT) principle according to planned treatment

regimen.

Full analysis set The set of subjects that is as close(FAS) as possible to the ideal implied by

the ITT-principle.

Per protocol The set of subjects who complied(PP) set with the protocol sufficiently to ensure

that they are likely to exhibit the effects of treatment according to the

underlying scientific model.

Page 23: Prague 02.10.2008

FAS vs. PP-set

FAS + no selection bias- misclassification problem (effect dilution)

PP-set + no contamination problem- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the sameconclusions, confidence in the trial is supported.

Page 24: Prague 02.10.2008

5. Statistical models

Page 25: Prague 02.10.2008

Fixed and random effects

Fixed effects when the levels of an effect constitute the entire population

about which you are interested.

Random effects when the levels in your experiment represent only a sample from that population.

Random effects models can be used to analyze data with multiple observations per patient.

Page 26: Prague 02.10.2008

Mixed effects model

If all the effects in a statistical model (ANOVA) are considered random effects, then the model is called a random effects model; likewise, a model with only fixed effects is called a fixed effects model. When some factors are fixed and others are random, the model is called a mixed model.

(R.A. Fisher 1926: Type-1 and type-2 ANOVA)

Page 27: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

Data from 3 subjects: Messrs. Green, Blue and Red

Page 28: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

Analysis requirement: FAS

Page 29: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

1. Assume independence between subjects'repeated observations and use ANOVA

Page 30: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

1. Assume independence between subjects'repeated observations and use ANOVA

Bad idea:Within-subject variation is confused with between-subject variation. Statistical precision will be incorrectly calculated.

Page 31: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

2. Repeated fixed effects comparisons e.g. Student's t-tests

Page 32: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

2. Repeated fixed effects comparisons e.g. Student's t-tests (no FAS)

Page 33: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

3. Fixed effects RM-model

Page 34: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

3. Fixed effects RM-model(no FAS)

Page 35: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

4. Fixed effects RM-model with LOCF

Page 36: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

4. Fixed effects RM-model with LOCF

LOCF-imputation is not necessarily conservative, and under-estimates variability.

Not the best alternative!

Page 37: Prague 02.10.2008

TimeBaseline

Effect

1st visit 2nd visit

5. Mixed effects (subject random) ANOVA

Page 38: Prague 02.10.2008

Within- and between subject variation are separated in the model. Statistical precision is correctly calculated.

A number of publica-tions reporting monte-carlo simulation studies show that this is the best alternative, both in terms of precision and validity!

TimeBaseline

Effect

1st visit 2nd visit

5. Mixed effects (subject random) ANOVA

Page 39: Prague 02.10.2008

Example: FREE SF36-PCS

Estimated treatment effect difference at 1 month

Method Difference p-value

ITT-analysisME ANOVA 5.5 <0.0001

PP-analysisFE ANOVA Compl. 5.2 <0.0001FE ANOVA LOCF 4.9 <0.0001

Page 40: Prague 02.10.2008

Thank you for your attention!