From diagnostic test to hypothesis test - MICEapps · From diagnostic test to hypothesis test. Plan...

Post on 24-Jun-2020

30 views 0 download

Transcript of From diagnostic test to hypothesis test - MICEapps · From diagnostic test to hypothesis test. Plan...

Kameshwar Prasad

Professor of Neurology, Former Chief, Neurosciences Centre

All India Institute of Medical Sciences, New Delhi

From diagnostic test to

hypothesis test

Plan

• Revise concepts related to diagnostic

tests

• Learn concepts related to hypothesis

testing

How do I teach sample size

calculation?

Dichotomous Outcome (2 Independent Samples)

• Test H0: p1 = p2 vs. HA: p1 p2

• Assuming two-sided alternative and equal allocation

***Always Round Up To Nearest Integer!

2

22111/2-1

/

2 z

qpqpzqpn groupper

p1, p2 = projected true probabilities of “success” in the

two groups

q1 = 1 – p1, q2 = 1 – p2

= p1 – p2

p = (p1 + p2)/2, q = 1 – p

z1-/2 is the N(0,1) cutoff corresponding to

z1- is the N(0,1) cutoff corresponding to β

Dichotomous Outcome(2 Independent Samples)

where is the probability from a standard normal distribution

2211

2/1 2

qpqp

qpznPower

Continuous Outcome(2 Independent Samples)

• Test H0: 1 = 2 vs. HA: 1 2

• Two-sided alternative and equal allocation

• Assume outcome normally distributed with:

2

2

12/1

2

2

2

1

/

zzn groupper

mean 1 and variance 12 in Group 1

mean 2 and variance 22 in Group 2

For RCTs

Sample size with grade 3 math

Randomized Controlled Trial (RCT)

Two groups of equal size

Parallel groups

• Hypothesis : In patients with hypertensive brain

haemorrhage, surgery reduces 30-day mortality from

40% to 20%.

Best medical management alone : 40% (Pc) to

Surgery + best medical management : 20% (Pe)

RCT : Superiority Hypothesis

Find out their average = 30% (p)

Pe = 20%

Pc = 40%

Find out the difference between Pe & Pc

= 20% (d)

RCT : Superiority Hypothesis

Average of Pe & Pc = 30% (p)

Difference between Pe & Pc = 20% (d)

Sample size per group =

16p(100 - p)

(d x d)= 16 x 30 x 70 = 84 per group

20 x 20

Total N=168

RCT : Superiority Hypothesis

Sample size per group =

16p(100 - p)d x d

This is for a study with two equal parallel groups.

p = average of Pe & Pc

d = difference between Pe and Pc

Exercise: calculate sample size for the following hypothesis:

Dexamethasone adjunctive therapy reduces mortality from 15% to 5% in children with neonatal meningitis.

Sample size per group =

16p(100 - p)d x d

This is for a study with two equal parallel groups.

Thank You

By now, you have at least one

question?

• Where does the ‘16’ come from?

By now, you have at least one

question?

• Where does the ‘16’ come from?

• Before we address this, I will take your

test…….

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+

-

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+ True positive

-

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+ True positive False positive

-

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+ True positive False positive

- False

negative

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+ True positive False positive

- False

negative

True negative

Truth revealed by Gold

standard

T

E

S

T

Disease + Disease -

+ 190 40

- 10 160

200 200

Hypothetical Example of a study

with sample size of 400

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+ 190

True positive

40

False positive

- 10

False negative

160

True positive

200 200

Hypothetical Example of a study

with sample size of 400

Truth revealed by Gold

standard

T

E

S

T

Disease + Disease -

+ 95% 20%

- 5% 80%

Hypothetical Example of a study

with sample size of 400

Truth revealed by Gold

standard

T

E

S

T

Disease

+

Disease -

+ 190 40

- 10 160

200 200

Truth revealed by Gold standard

T

E

S

T

Disease + Disease -

+ TP rate FP rate

- FN rate TN rate

What we call rate is actually probability.

Truth revealed by Gold

standard

T

E

S

T

Disease

+

Disease -

+ 95%

(True

positive

rate)

20%

(False

positive

rate)

- 5%

(False

negative

rate)

80%

(True

negative

rate)

Hypothetical Example of a study

with sample size of 400

Truth revealed by Gold

standard

T

E

S

T

Disease + Disease -

+ 190

(True

Positive)

40

(False

Positive)

- 10

(False

Negative)

160

(True

Negative)

200 200

Truth revealed by Gold

standard

T

E

S

T

Disease + Disease -

+ 95%

(True

positive

rate)

20%

(False

positive

rate)

- 5%

(False

negative

rate)

80%

(True

negative

rate)

Giving names: ‘terms’

Truth revealed by Gold

standard

T

E

S

T

Disease + Disease -

+ Sensitivity

(True

Positive

rate)

(False

Positive

rate)

-

(False

Negative

rate)

Specificity

(True

Negative

rate)

Think of relationship between

false-negative rate and sensitivity

2 by 2 table: sensitivity

Disease

Test

+ -

+

-

Sensitivity = a / a + c

Proportion of people

with the disease who

have a positive test

result.

So, a test with 84%

sensitivity….means

that the test identifies

84 out of 100 people

WITH the disease

a

True

positives

c

False

negatives

Conducting a study is like doing a

diagnostic test

• Want to know (diagnose) the truth

• But there is no ‘gold standard’ to reveal

the truth, which is known only to ‘God’

• Any study, like a diagnostic test, is an

attempt to find the truth

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

Does not Work

-

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

Positive

Does not Work

-

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

positive

False

positive

Does not Work

-

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

positive

False

positive

Does not Work

-

False

negative

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

positive

False

positive

Does not Work

-

False

negative

True

negative

Four Possible Results

• Where do results go wrong?• Where do errors occur?

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

Positive

False

Positive

Does not Work

-

False

Negative

True

Negative

Four Possible Results

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

Positive

False

Positive error

Does not Work

-

False

Negative

error

True

Negative

Four Possible Results

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True

Positive

Type I error

(False

positive)

Does not Work

-

Type II

error

(false

negative)

True

Negative

Types of errors

False positive result: Type I error

False negative result: Type II error

Cannot plan for error free results

Four Possible Results

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True Positive

probability

False

Positive error

probability

Does not Work

-

False

Negative

error

probability

True

Negative

probability

Consultation with prof of

biostatistics

• Sir, can you help with sample size

calculation of my thesis (RCT)?

• And, so on….

Four Possible Results

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True Positive

probability

False Positive

error probability

(alpha)

Does not Work

-

False Negative

error probability

(Beta)

True Negative

probability

Think of relationship between

false-negative rate and sensitivity

2 by 2 table: sensitivity

Disease

Test

+ -

+

-

Sensitivity = a / a + c

Proportion of people

with the disease who

have a positive test

result.

So, a test with 84%

sensitivity….means

that the test identifies

84 out of 100 people

WITH the disease

a

True

positives

c

False

negatives

What is the true probability when beta is varying?

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True Positive

probability

False Positive

error probability

(alpha)

Does not Work

-

False Negative

error probability

(Beta)

True Negative

probability

What is the other name for true positive probability?

Truth about New Treatment

Study finds

that treatment

Works

+

Does not Work

-

Works

+

True Positive

probability

POWER

False Positive error

probability (alpha)

Does not Work

-

False Negative

error probability

(Beta)

True Negative

probability

How much risk of errors you want to take?

Type I error rate

5%

Type II error rate

5%, 10%, 20%

Probability of Errors

Probability of FP (Type I ) error: α

Probability of FN (Type II) error: β

Almost always α=5%

Usually β = 20 %, 10%.

With 20%, power = ?, With 10%, power = ?

Source for 16

α= 5% β= 20%Power = 80%

Allocation ratio = 1:1

Dichotomous Outcome (2 Independent Samples)

• Test H0: p1 = p2 vs. HA: p1 p2

• Assuming two-sided alternative and equal allocation

***Always Round Up To Nearest Integer!

2

22111/2-1

/

2 z

qpqpzqpn groupper

p1, p2 = projected true probabilities of “success” in the

two groups

q1 = 1 – p1, q2 = 1 – p2

= p1 – p2

p = (p1 + p2)/2, q = 1 – p

z1-/2 is the N(0,1) cutoff corresponding to

z1- is the N(0,1) cutoff corresponding to β

ANY QUESTION?

Table: Multiplication factors for frequently used

power and alpha*

Power Multiplication

Factor for

alpha = 5%

Multiplication

Factor for

alpha = 1%

80% 16 23

90% 21 30

95% 26 36

99% 37 48

* Rounded to the nearest whole number.

Summary (what have we learnt)

• Concepts of hypothesis testing are

similar to those used in studies of

diagnostic tests

• Type I error/ Type II error

• Alpha/ beta

• Power

• ‘God is great’

Thank You

Sample size for a diagnostic test

study

• How many cases (Disease +) you

need?

• What is your expected (for the test to be

useful) sensitivity? Say 90% (= p)

• Within what range you want to estimate

this? Say +/- 5%

• d= difference between upper and lower

limit of the range 10%

• Now do the mental math.

Sample size for a diagnostic test

study• How many controls (disease -)you

need?

• What is your expected (for the test to be

useful) specificity? Say 80% (= p)

• Within what range you want to estimate

this? Say +/- 5%

• d= 10%

• Now do the mental math.

What kind of outcome measure

does your study have?

• Two category (dichotomous, binary)

• Numerical (continuous)

What kind of outcome measures

are there in these statements?

• Rosaglitazone reduces blood sugar level in diabetes.

• Clopidogrel reduces incidence of myocardial infarction.

• Carotid angioplasty prevents stroke.

• Nifedipine controls BP effectively in hypertensive

emergencies.

• Statins control cholesterol level in high-risk individuals.

• Steroids induce remssionin SLE.

• Rampril improves LV ejection fraction.

RCT-Superiority hypothesis

Continuous outcome measure• RCT : Two groups of equal size.

• Formula : 16x how much effect are you interested in?

• The value of ‘x’ depends on size of effect.

Effect size x Sample

Small 25 16x25 = 400 per group

Moderate 4 16x4 = 64 per group

Large 2 16x2 = 32 per group

Effect Size

• How much is the difference with respect

to its variation

• Difference d

• Variation s.d.

• Effect size = d / s.d.

• Large 0.8, Moderate 0.5, Small 0.2

Example

• You are planning a study to improve

LVEF using stem cells in acute MI

• You expect moderate effect, hence the

sample size will be 16x4 = 64 per group

• What is the general formula?

• n per group = 16 s2/d2

Variable Placebo (N=92) BMC (N=95)

Global LVEF (%)

Baseline

Mean

Median

46.9±10.4

47.5

48.3±9.2

50.6

4 Mo

Mean

Median

49.9±13.0

53.2

53.8±10.2

54.7

Absolute difference

Mean

Median

3.0±10

4.0

5.5±10

5.0

Death, recurrence of MI, & any

revascularization procedure40 / 103 23/101

One MI Study using stem cells

Calculating sample size for LVEF change

• Need two things :

Standard deviation (s)

Difference expected (d)

s = 10%

d = 5%

Sample size = 16s2 / d2

= (16 x 10 x 10) / (5 x 5)

= 64 per group

k n1 n2 n1+n2

1 n n 2n

2 0.75n 1.5n 2.25n

3 0.67n 2.0n 2.67n

4 0.62n 2.5n 3.12n

5 0.60n 3.0n 3.60n

10 0.55n 5.5n 6.05n

100 0.50n 50.n 50.0n

Table: Study sizes necessary to achieve approximately

the same power in trial with two groups, of which one

contains k times as many individuals as the other.

Reference for the formula

• Lehr R. Sixteen S-aquared over D-

squared: A Relation for crude sample

size estimates. Statistics in Medicine

1992;11:1099-1102

Disclaimer

• The sample size formula discussed in the given time

works only for –

RCT with superiority hypothesis and dichotomous

outcomes

NOT for case control, cohort or cross-sectional

studies

NOT for non-inferiority or equivalence hypothesis

Thank You

RCT : Superiority Hypothesis

In % 16p(100-p)

d x d

In decimals, 16p (1-p)

d x d

p = average of Pe & Po

d = difference between Pe and Po