Understanding P- values and Confidence Intervals Thomas B. Newman, MD, MPH \Clinepi...

$: Understanding P- values and Confidence Intervals Thomas B. Newman, MD, MPH \Clinepi 2004\Understanding P- values and CI 10Nov04.$
Understanding P- values and Confidence Intervals

Thomas B. Newman, MD, MPH

\Clinepi 2004\Understanding P- values and CI 10Nov04

Overview

Introduction and justification What P-values and Confidence Intervals don’t

mean What they do mean: analogy between

diagnostic tests and clinical research Useful confidence interval tips

– CI for “negative” studies; absolute vs relative risk– Confidence intervals for small numerators

Why cover this material here?

P-values and confidence intervals are ubiquitous in clinical research

Widely misunderstood and mistaught Pedagogical argument:

– Is it important?– Can you handle it?

Example: Douglas Altman Definition of 95% Confidence Intervals* "A strictly correct definition of a 95% CI

is, somewhat opaquely, that 95% of such intervals will contain the true population value.

Little is lost by the less pure interpretation of the CI as the range of values within which we can be 95% sure that the population value lies.“

Hard to understand

Wrong!

Understanding P-values and confidence intervals is important because It explains things which otherwise are

paradoxical and do not make sense, e.g. need to state hypotheses in advance, correction for multiple hypothesis testing

You will be using them all the time You are future leaders in clinical

research

You can handle it because

We have already covered the important concepts at length earlier in this course– Prior probability– Posterior probability– What you thought before + new

information = what you think now We will support you through the process

Review of traditional statistical significance testing

State null (Ho) and alternative (Ha) hypotheses

Choose α Calculate value of test statistic from

your study Calculate P- value from test statistic If P-value < α, reject Ho

Problem: Traditional statistical significance

testing has led to widespread misinterpretation of P-values

What P-values don’t mean

If the P-value is 0.05, that means that there is a 95% probability that…– The results did not occur by chance– The null hypothesis is false– There really is a difference between the

groups

Chalk board:

Easy illustration of why non-Bayesian approach is wrong

Analogy with diagnostic tests: 2x2 tables and “false positive confusion”

Extending the analogy to understand a priori vs post hoc hypotheses, multiple hypotheses, etc.

(This is covered step-by-step in the course book.)

Bonferroni Inequality: If we do k different tests, each

with significance level alpha, the probability that one or more will be significant is less than or equal to k*alpha

Correction: If we test k different hypotheses and want our total Type 1 error rate to be no more than alpha, then we should reject H0 only if P < alpha/k

Confidence Intervals for negative studies: 5 levels of sophistication Example 1: Oral amoxicillin to treat

possible occult bacteremia in febrile children*– Randomized, double-blind trial– 3-36 month old children with T> 39 C (N=

955)– Treatment: Amox 125 mg/tid (< 10 kg) or

250 mg tid (> 10 kg)– Outcome: major infectious morbidity

Jaffe et al., New Engl J Med 1987;317:1175-80

Amoxicillin for possible occult bacteremia 2: Results Overall 27 children (~3%) bacteremic Of these 27, major infectious morbidity

occurred in 3: 2 persistent bacteremia, 1 periorbital cellulitis:

2/19 (10.5%) with amoxicillin vs 1/8 (12.5%) with placebo. (P = 0.9)

Conclusion: “Data do not support routine use of standard doses of amoxicillin…”

5 levels of sophistication Level 1: P > 0.05 = treatment does not

work Level 2: Look at power for study.

(Authors reported power = 0.24 for OR=4. Therefore, study underpowered and negative study uniformative.)

5 levels of sophistication, cont’d Level 3: Look at 95% CI for RR

RR= .84; 95% CI (.09 to 8.0)(This was level of TBN and RHP letter to the editor, 1987. Note authors calculated OR= 1.2 and 95% CI 0.02 to 30.4))

Level 4: Make sure you do ITT analysis! (Not OK to restrict attention to bacteremic patients!)

So it’s 2/507 vs 1/448; RR= 1.8 (amoxicillin worse); 95% CI (0.05 to 6.2)

Level 5: the clinically relevant quantity is the Absolute Risk Reduction (ARR)! 2/507 (0.4%) with amoxicillin vs 1/448 (0.2%)

with placebo ARR = -0.17% {amoxicillin worse} 95% CI (-0.9% {harm} to +.5% {benefit}) Therefore, LOWER limit of 95% CI for benefit

(I.e., best case) is NNT= 1/0.5% = 200 So this study suggests need to treat >= 200

children to prevent Major Infectious Morbidity in one

Stata output. csi 2 1 505 447

| Exposed Unexposed | Total

-----------------+------------------------+----------

Cases | 2 1 | 3

Noncases | 505 447 | 952

-----------------+------------------------+----------

Total | 507 448 | 955

| |

Risk | .0039448 .0022321 | .0031414

| |

| Point estimate | [95% Conf. Interval]

|------------------------+----------------------

Risk difference | .0017126 | -.005278 .0087032

Risk ratio | 1.767258 | .1607894 19.42418

Attr. frac. ex. | .4341518 | -5.219315 .9485178

Attr. frac. pop | .2894345 |

+-----------------------------------------------

chi2(1) = 0.22 Pr>chi2 = 0.6369

Example 2: Pyelonephritis and new renal scarring in the International Reflux Study in Children* RCT of ureteral reimplantation vs prophylactic

antibiotics for children with vesicoureteral reflux

Overall result: surgery group fewer episodes of pyelonephritis (8% vs 22%; NNT = 7; P < 0.05) but more new scarring (31% vs 22%; P = .4)

This raises questions about whether new scarring is caused by pyelonephritis

Weiss et al. J Urol 1992; 148:1667-73

Within groups no association between new pyelo and new scarring

New scarring N %New pyelo 2 20 10 No new pyelo 28 96 29 Total 30 116

RR=0.34; 95% CI (0.09-1.32)Weiss, J Urol 1992:148;1672

Trend goes in the OPPOSITE direction

Stata output to get 95% CI: . csi 2 28 18 68

| Exposed Unexposed | Total-----------------+------------------------+---------- Cases | 2 28 | 30 Noncases | 18 68 | 86-----------------+------------------------+---------- Total | 20 96 | 116 | | Risk | .1 .2916667 | .2586207 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------- Risk difference | -.1916667 | -.3515216 -.0318118 Risk ratio | .3428571 | .0887727 1.32418 Prev. frac. ex. | .6571429 | -.3241804 .9112273 Prev. frac. pop | .1133005 | +----------------------------------------------- chi2(1) = 3.17 Pr>chi2 = 0.0749

Conclusions

No evidence that new pyelonephritis causes scarring Some evidence that it does not P-values and confidence intervals are approximate,

especially for small sample sizes (and subject to manipulation)

Key concept: calculate 95% CI for ARR for negative studies

Confidence intervals for small numerators

Observed numerator

Approximate Numerator for

Upper Limit of 95% CI

0 31 52 73 94 10

P-values and Confidence Intervals

Probably won’t cover this, but FYI:– Usually P < 0.05 means 95% CI excludes null value. – But both 95% CI and P-values are based on approximations,

so this may not be the case– Illustrated by IRSC slide above– If you want 95% CI and P- values to agree, use “test-based”

confidence intervals – see next slide

Alternative Stata output: Test-based CI

. csi 2 28 18 68, tb

| Exposed Unexposed | Total-----------------+------------------------+---------- Cases | 2 28 | 30 Noncases | 18 68 | 86-----------------+------------------------+---------- Total | 20 96 | 116 | | Risk | .1 .2916667 | .2586207 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------- Risk difference | -.1916667 | -.4035313 .0201979 (tb) Risk ratio | .3428571 | .1050114 1.119412 (tb) Prev. frac. ex. | .6571429 | -.1194122 .8949886 (tb) Prev. frac. pop | .1133005 | +----------------------------------------------- chi2(1) = 3.17 Pr>chi2 = 0.0749

Understanding P- values and Confidence Intervals Thomas B. Newman, MD, MPH \Clinepi...

Documents

Transcript of Understanding P- values and Confidence Intervals Thomas B. Newman, MD, MPH \Clinepi...