EPI-820 Evidence-Based Medicine

32
1 EPI-820 Evidence-Based Medicine LECTURE 7: CLINICAL STATISTICAL INFERENCE Mat Reeves BVSc, PhD

description

EPI-820 Evidence-Based Medicine. LECTURE 7: CLINICAL STATISTICAL INFERENCE Mat Reeves BVSc, PhD. Objectives. Understand the theoretical underpinnings and the flaws associated with the current approach to clinical statistical testing (the frequentist approach). - PowerPoint PPT Presentation

Transcript of EPI-820 Evidence-Based Medicine

Page 1: EPI-820  Evidence-Based Medicine

1

EPI-820 Evidence-Based Medicine

LECTURE 7: CLINICAL STATISTICAL INFERENCE

Mat Reeves BVSc, PhD

Page 2: EPI-820  Evidence-Based Medicine

2

Objectives

• Understand the theoretical underpinnings and the flaws associated with the current approach to clinical statistical testing (the frequentist approach).

• Understand the difference between testing and estimation

• Understand the advantages of the CI and the CI functions.

• Understand the logic of a Bayesian Approach

Page 3: EPI-820  Evidence-Based Medicine

3

Personal Statistical History….• Post-DVM

• Clue-less. Sceptical of the role of statistics• Thinks research = the search for P < 0.05

• PhD Era:• Increasing obsession with stat methods• Lots of tools! SLR, ANOVA, MLR, LR, LL & Cox• Thinks statistics = “real science”

• Post-PhD:• Healthy scepticism for the way stats are used• Stats = methods which have inherent limitations• Not a substitute for clear scientific thought or understanding the

“scientific method”

Page 4: EPI-820  Evidence-Based Medicine

4

Review of Significance Tests

Alternative hypothesis (Ha): the mean body wt. of cows trt with BST is different from the mean body wt. of control cows

Ux Uy

Substantive hypothesis: Cows on BST will tend to gain weight

Null hypothesis (Ho): the mean body wt. of cows trt with BST is not different from the mean body wt. of control cows

Ux = Uy

Page 5: EPI-820  Evidence-Based Medicine

5

Review of Significance Tests

- Logically, if Ho is refuted Ha is confirmed

- investigator seeks to 'nullify' Ho

Expt:

20 cows randomized to BST (X) and control (Y). Measure wt.gain. Calculate mean wt. change per group.

Page 6: EPI-820  Evidence-Based Medicine

6

Review of Significance Tests

ii) Populations are normally distributed, equal variance

iii) The Ho is true

Assumptions:

i) Sample statistic (X - Y) is one instance of an infinitely largenumber of sample statistics obtained from an infinite number ofreplications of the expt., under the same conditions (frequentistassumption)

Page 7: EPI-820  Evidence-Based Medicine

7

Review of Significance Tests (t-test)

- t may take on any value, no value is logically inconsistent with Ho! Smaller t values are more consistent with Ho being true.- all else equal, larger n’s increase value of t (higher power).

yxS

YXt

Where:

N (0, 1)df = (n1 – 1) (n2 – 1)

2

21

).11

( Snn

yxS = standard error of the difference between two independent means.

S2 = estimate of pooled population variance

Page 8: EPI-820  Evidence-Based Medicine

8

Review of Significance Tests

- By convention, relative frequency of t where we decide to choose (ii) above as a logical conclusion is set to 5% (alpha level or significance level)

- Expt: t = 2.55, p = 0.02, reject Ho - result is significant

Large values of t indicate:

i) test assumptions are true, a rare event has occurred

ii) one of the assumptions of the test is false, and by convention it is assumed that the Ho is not true.

Page 9: EPI-820  Evidence-Based Medicine

9

Review of Significance Tests

- Type 1 error (alpha), occurs 5% of the time when Ho is true

- Type II error (beta), occurs B% of the time when Ho is false

- Alpha and beta are inversely related

- Fixing alpha at 5%, means Sp is 95%

- Beta is not set 'a priori‘, hence Se (power) tends to be low

- Scientific caution dictates that set alpha small

- Scientific ignorance dictates we ignore beta!

Page 10: EPI-820  Evidence-Based Medicine

10

Alpha and beta are inversely related

Page 11: EPI-820  Evidence-Based Medicine

11

R e la tio n s h ip b e tw e e n d ia g n o s tic te s t re s u lt a n d d is e a s e s ta tu s

DISEASE

PRESENT (D+) ABSENT (D-)

TEST

POSITIVE (T+)

NEGATIVE (T-)

TP FP

FN TN

a bc d

Sp= P(T-|D-)

PVN= d c + d

Sp= d/b + d

PVP= a a + b

Se= a/a + c

Se= P(T+|D+)

Page 12: EPI-820  Evidence-Based Medicine

12

R e la tio n s h ip b e tw e e n s ig n ific a n c e te s t re s u lts a n d tru th

TRUTH

Ho False Ho True

SIGNF.

REJECT Ho

ACCEPT Ho

TP FP

FN TN

(1 - B) Type I (a)

Type II (B) (1 - a)

Sp= TN/TN + FP

PVN= TN TN + FN

TEST

PVP= TP TP + FP

Se= Power (1 - B)

Se= TP/TP + FN

Page 13: EPI-820  Evidence-Based Medicine

13

Power

- Probability of rejecting Ho when Ho is false

- Se = TP/(TP + FN) or (1 - B)

- Power is a function of:

i) Alpha (increase by making Ha one sided i.e., Ux > Uy) (consistent with changing the cut-off value)

ii) Reliability (as measured by SE of the difference)

- SE decreases with increasing sample size (= decr variance)

iii) Size of treatment effect

- Power increases with decreasing SE

Page 14: EPI-820  Evidence-Based Medicine

14

The Consequences of Low Power

i) difficult to interpret negative results

- truly no effect

- expt unable to detect true difference

ii) increase proportion of type 1 errors in literature

iii) fail to identify many important associations

iv) low power means low precision (indicated by the confidence interval)

Page 15: EPI-820  Evidence-Based Medicine

15

Questions?

• What proportion of statistically significant findings published in the literature are false positive (Type 1) errors?

• What well known measure is this proportion? and, what elements does this figure therefore depend on?

Page 16: EPI-820  Evidence-Based Medicine

16

TRUTH

Ho FALSE Ho TRUE

SIGNF.TEST

REJECT Ho

ACCEPT Ho

50 20

50 380

PV+ = 50/70 = 71%

Se = 50% Sp = 95%100 400 N = 500

If all signf. results published, 29% are Type 1 errors

Hypothetical outcomes of 500 experiments, a= 0.05, Power= 0.50, and20% prevalence of false Ho’s

Page 17: EPI-820  Evidence-Based Medicine

17

The P value

- probability of obtaining a value of the test statistic (X) at least as large as the one observed, given the Ho is true

- It is NOT P (Ho true|Data)!!!

- We can never state the probability of a hypothesis being true! (under the frequentist approach)

Common Incorrect Interpretations

- The probability that the results were due to chance!

- P (>=X | Ho true)

Page 18: EPI-820  Evidence-Based Medicine

18

Criticisms of Significance Tests

i) Decision vs Inference (Neyman-Pearson)

- problem of automatic acceptance or rejection based on an arbitrary cutoff (P= 0.04 vs P=0.06)

- pioneers of modern statistics were interested in producing results that enabled decisions to be made

- results should adjust your degree of belief in a hypothesis rather than forcing you to accept an artificial dichotomy

- "intellectual economy"

Page 19: EPI-820  Evidence-Based Medicine

19

Criticisms of Significance Tests

ii) Asymmetry of significance tests

- acceptance of both Ho's given the data leads to 2 very different conclusions!

- frequently, the experimental data can be found to be consistentwith a Ho of no effect or a Ho of a 20% increase

- asymmetry was recognized by Fisher, hence convention is to identify theory with the Ha but to test the Ho

- Is there an effect? is the wrong question! Should ask: What is the size of the effect?

Page 20: EPI-820  Evidence-Based Medicine

20

Criticisms of Significance Tests

iii) Corroborative power of significance tests

- Both schools presume Ho is almost always false

- Both Fisherian and Neyman-Pearson schools make no assumption about the prior probability of Ho

- rejection of Ho does nothing to illuminate which of the vast number of Ha’s are supported by the data!

- Failing to reject Ho does not prove Ho is true (Popper: 'we can falsify hypotheses but not confirm them')

Page 21: EPI-820  Evidence-Based Medicine

21

Criticisms of Significance Tests

iv) Effect size and significance tests

- Cannot infer size of an effect by inspection of the P value reporting P< 0.00001 has no scientific merit!

- Test statistics and p values are a function of both effect size and sample size

- Highly significant results may be derived from trivial effects if sample size is large.

- Confidence intervals give plausible range for the unknown popl parameter (signf tests show what the parameter is not!)

Page 22: EPI-820  Evidence-Based Medicine

22

Relationship between the Size of the Sample and the Size of the P Value

• Example RCT:• Intervention: new a/b for pneumonia.

• Outcome: Recovery Rate = % of patients in clinical recovery by 5 days

• Facts:• Known = Existing drug of choice results in 35%

recovery rate at 5 days

• Unknown = New drug improves recovery rate by 5% (to 40%)

Page 23: EPI-820  Evidence-Based Medicine

23

P values Generated by RCT by Sample Size

Sample Size (N = 2x) P value (Chi-square)

100 0.465

500 0.103

600 0.074

700 0.053

800 0.039

1000 0.021

Page 24: EPI-820  Evidence-Based Medicine

24

Conclusion?

Significance testing should be abandoned and replaced withinterval estimation (point estimate and CI)! Why?

- do not imply any decision making implications

- not couched in pseudo-scientific hypothesis testing language

- give plausible range to unknown popl parameter

- gives clue as to sample size (width of the CI)

- avoids danger of inferring a large effect when result if highly significant

Page 25: EPI-820  Evidence-Based Medicine

25

Interval estimation

- want an unbiased, precise measure of effect

- view "experimentation" as a measurement exercise

- Point estimate: best estimate of the true effect, given the data (aka MLE) and it indicates the magnitude of effect (but is imprecise)

- Confidence intervals indicate degree of precision of estimate. Represent a set of all possible values for the parameter that are consistent with the data

- width of CI depends on variability and level of confidence (%)

Page 26: EPI-820  Evidence-Based Medicine

26

Interval estimation

- 90% of such intervals will include the true unknown popl. parameter (necessary frequentist interpretation)

- it does not represent a 90% probability of including the true unknown popl. parameter within it

- 90% CI:

- CIs indicate magnitude and precision.

- CI are linked to alpha and hypothesis testing (1 - alpha) = 95%

Page 27: EPI-820  Evidence-Based Medicine

27

Interval estimation - Example

OUTCOME

TRT B

TRT A

+ -

7

14 6 20

13 20

P(success)= 70%

Significance test: P= 0.06 or NS!

P(success)= 35%

Interval estimation of difference: 35% (95%CI = -1,+71%)

Page 28: EPI-820  Evidence-Based Medicine

28

Confidence Intervals

- CI are non-uniform, true parameter is more likely to be located centrally than near to limits. Therefore precise location of boundary is irrelevant!

- CI functions

- For a study to be reassuring about a lack of effect, boundaries of CI should be near the null value

- CIs have clear advantages over the p-value but still suffer from the necessary frequentist interpretation (a CI represents one member of a family of CIs produced by an infinite number of replications of the same experiment)

Page 29: EPI-820  Evidence-Based Medicine

29

Study B

Study A

null point

larger effect

Which is the more important study?

Page 30: EPI-820  Evidence-Based Medicine

30

Importance of Beta (Type II error) and Sample Size in RCT’s (Freiman et al 1978)

• Reviewed 71 “negative’ (P > 0.05) RCT published from 1960-77

• Assume 25% treatment effect:• 94% (N= 67) of trials had < 90% power• Only 15% (N= 10) had sufficient evidence to

conclude no effect

• Assume 50% treatment effect: • 70% (N= 50) of trials had < 90% power • Only 32% (N= 16) had sufficient evidence to

conclude no effect

Page 31: EPI-820  Evidence-Based Medicine

31

The P Value Fallacy - Goodman

• Derives from the simultaneous application of the p-value as:• A long-run, error based, deductive tool (Neyman

Pearson frequentist application), and

• A short-run, evidential and inductive tool (i.e., what is the meaning of this particular result?)

• The p-value was never designed to serve these two conflicting roles

Page 32: EPI-820  Evidence-Based Medicine

32

The Bayes Factor - Goodman

• Comparison of how well two hypotheses predict the data: P (Data | given the Ho) P (Data | given the Ha)

• Allows explicitly the incorporation of external evidence (in terms of prior probability/belief)

• Use of Bayesian statistics shows that weight of evidence against the Ho is not as strong as the p-value suggests (Table 2)