Understanding Statistical Power for Non-Statisticians
-
Upload
statistics-data-corporation -
Category
Healthcare
-
view
431 -
download
3
Transcript of Understanding Statistical Power for Non-Statisticians
Understanding Statistical Power for Non-StatisticiansWEBINAR | JUNE 28, 2016 | 12:00PM EASTERN | DIA 2016DALE W. USNER, PH.D. | PRESIDENT | SDC
General Housekeeping
Use mic & speakers or call in Webinar users submit
questions via chat
Dale W. Usner, Ph.D.
20 years in industry 50+ FDA and international regulatory body
interactions Frequent study design support Therapeutic Expertise
Anti-viral/Anti-infective, Cardiovascular, Gastrointestinal, Oncology/Immunology, Ophthalmology, Surgical, Other
Agenda
What is Statistical Power How Assumptions Affect Statistical Power and Sample Size How Power Is Associated With What is Statistically Significant Q&A
What is Statistical
Power?
What is Statistical Power?
Clinical/Medical: If I want to compare our Test Product to the Control Product in systolic blood pressure, what sample size will I need?
Statistician: Assuming a Difference in Means of 10 mm Hg and a common Standard Deviation (SD) of 20 mm Hg, 64 subjects per treatment group are required to have 80% power for a 2-sided = 0.05 test.
What is Statistical Power?
Next?
What is Statistical Power? Questions
What does it mean to have 80% power?
What is Statistical Power? Questions
What does it mean to have 80% power? What does it mean to assume a difference in
means of 10 mm Hg? What if the true diff <10? >10?
What is Statistical Power? Questions
What does it mean to have 80% power? What does it mean to assume a difference in
means of 10 mm Hg? What if the true diff <10? >10?
What role does the SD play? What if the true SD is >20? What if the true SD is <20?
What is Statistical Power? Questions
What does it mean to have 80% power? What does it mean to assume a difference in
means of 10 mm Hg? What if the true diff <10? >10?
What role does the SD play? What if the true SD is >20? What if the true SD is <20?
What happens if the difference observed in the study is <10 mm Hg?
Motivation
Assuming a Difference in Means of 10 mm Hg and a common SD of 20 mm Hg, N = 64 subjects per treatment group are required to have 80% power for a 2-sided = 0.05 test.
The study will demonstrate a statistically significant result if the difference observed in the study is ≥7.0 mm Hg (assuming the observed SD is 20 mm Hg).
Motivation
With 85% Power (N = 73 subjects / Tx Gp) observed differences ≥6.55 mm Hg would yield statistical significance
With 90% Power (N = 86 subjects / Tx Gp) observed differences ≥6.03 mm Hg would be statistical significance
Background
Statistical Inference
Statistical Inference: Drawing conclusions about an entire population based on a sample from that population.
Hypotheses (Efficacy)
Superiority H0: Test Arm is no different from Control Arm H1: Test Arm is different (superior) than Control Arm
Non-Inferiority H0: Test Arm is inferior to Control Arm H1: Test Arm is non-inferior to Control Arm
Desired Outcome: Reject H0 in favor of H1
Statistical Inference: Coin Flip
H0: Proportion of Heads = Prop Tails = 0.50 H1: Proportion of Heads > Prop Tails Flip a coin 4 times with result 3H and 1T. 75%
Heads, should H0 be rejected?
Statistical Inference: Coin Flip
H0: Proportion of Heads = Prop Tails = 0.50 H1: Proportion of Heads > Prop Tails Flip a coin 4 times with result 3H and 1T. 75%
Heads, should H0 be rejected? No, probability of this occurring under H0 is
31.25%
Statistical Inference: Coin Flip
H0: Proportion of Heads = Prop Tails = 0.50 H1: Proportion of Heads > Prop Tails Flip a coin 4 times with result 3H and 1T. 75%
Heads, should H0 be rejected? No, probability of this occurring under H0 is
31.25% Flip a coin 40 times with result 30H and 10T.
75% Heads, should H0 be rejected?
Statistical Inference: Coin Flip
H0: Proportion of Heads = Prop Tails = 0.50 H1: Proportion of Heads > Prop Tails Flip a coin 4 times with result 3H and 1T. 75%
Heads, should H0 be rejected? No, probability of this occurring under H0 is 31.25%
Flip a coin 40 times with result 30H and 10T. 75% Heads, should H0 be rejected? Yes, probability of this occurring under H0 is <0.1%
Define and Power
(Type I Error) is the probability that the study concludes the Test Arm is different from the Control Arm, when the Test Arm truly is no different. This is a regulatory risk.
Power is the probability that the study concludes the Test Arm is different from the Control Arm, when in the Test Arm truly is different. This is sponsor risk.
Continuous & Binary Measures
Continuous Measures Generally testing differences in Means or Medians Standard Deviation also very important
Binary Measures Generally testing differences or ratios of proportions Standard Deviations generally determined by
assumed proportions
Power in Pictures
Consider a therapy designed to lower systolic blood pressure
(SysBP) by an additional 10 mm Hg more than the currently best selling therapy, which has been shown to lower the SysBP to an average of 140 mm Hg.
that each treatment has an SD of 20 mm Hg Assume the data follow normal distributions
Power in Pictures: Sample Size = 1
Power in Pictures: Sample Size = 5
Power in Pictures: Sample Size = 10
Power in Pictures: Sample Size = 30
Power in Pictures: Sample Size = 50
Power in Pictures: Sample Size = 64
Power in Pictures: Sample Size = 73
Power in Pictures: Sample Size = 86
Power in Pictures: Sample Size = 1
Power in Pictures: Sample Size = 5Power: 10%
Power in Pictures: Sample Size = 10Power: 18%
Power in Pictures: Sample Size = 30Power: 47%
Power in Pictures: Sample Size = 50Power: 69%
Power in Pictures: Sample Size = 64Power: 80%
Power in Pictures: Sample Size = 73Power: 85%
Power in Pictures: Sample Size = 86Power: 90%
Standard Powers
80%: If the test product is as efficacious as assumed under H1, 80% of trials should reject H0 in favor of H1 by design. Generally considered to be the lowest targeted power.
85%: 85% of trials should reject H0 in favor of H1 by design.
90%: 90% of trials should reject H0 in favor of H1 by design.
Sponsor Risk
80%: If the test product is as efficacious as assumed under H1, 20% of trials will fail to reject H0 in favor of H1 by design.
85%: 15% of trials will fail to reject H0 in favor of H1 by design.
90%: 10% of trials will fail to reject H0 in favor of H1 by design.
How Assumptions Affect Statistical
Power and Sample Size
Sample Size x Assumed Difference x Power
6 7 8 9 10 11 12 13 140
100
200
300
400
500
SD = 20, 2-sided alpha = 0.05
80% Power85% Power90% Power
Assumed Difference
Tota
l Sam
ple
Size
% Increase in Sample Size (N) for Increasing Power
Assuming a 2-sided = 0.05 test, N increases by: ~14% from 80% to 85% power ~34% from 80% to 90% power
Assuming a 2-sided = 0.10 test, N increases by ~16% from 80% to 85% power ~38% from 80% to 90% power
Assuming a 2-sided = 0.20 test, N increases by ~19% from 80% to 85% power ~46% from 80% to 90% power
Sample Size x Assumed Difference x Standard Deviation
6 7 8 9 10 11 12 13 140
100
200
300
400
500
600
80% Power, 2-sided alpha = 0.05
SD = 16SD = 18SD = 20SD = 22SD = 24
Assumed Difference
Tota
l Sam
ple
Size
% Increase in Sample Size (N) for Increasing Standard Deviation
Regardless of the and power, N increases by a factor of (SDhigh / SDlow)2
Sample Size x Assumed Difference x Alpha (Type I Error)
6 7 8 9 10 11 12 13 140
100
200
300
400
SD = 20, 80% Power
2-sided alpha = 0.202-sided alpha = 0.102-sided alpha = 0.05
Assumed Difference
Tota
l Sam
ple
Size
% Decrease in Sample Size (N) for Increasing Alpha (Type I Error)
Assuming 80% power, N decreases by: ~21% from 2-sided alpha = 0.05 to 0.10 ~42% from 2-sided alpha = 0.05 to 0.20
Assuming 85% power, N decreases by: ~20% from 2-sided alpha = 0.05 to 0.10 ~40% from 2-sided alpha = 0.05 to 0.20
Assuming 90% power, N decreases by: ~18% from 2-sided alpha = 0.05 to 0.10 ~37% from 2-sided alpha = 0.05 to 0.20
Sample Size x Assumed Difference x Power and Ratio of Randomization
6 7 8 9 10 11 12 13 140
100
200
300
400
500
600
SD = 20, 2-sided alpha = 0.05
80% Power80% Power 2:185% Power85% Power 2:190% Power
Assumed Difference
Tota
l Sam
ple
Size
% Increase in Sample Size (N) for Increasing Randomization Ratio
Regardless of the and power, N increases by: ~4.2% from 1:1 to 3:2 randomization ratio ~12.5% from 1:1 to 2:1 randomization ratio ~33.3% from 1:1 to 3:1 randomization ratio
% Decrease in Min Obs Diff Required for Significance with Increasing N
Regardless of the , the Minimum Observed Difference required for Significance decreases by a factor of sqrt(Nlow / Nhigh) With 80% Power, N = 64 / Tx Group required diff is 7.00 mm
Hg With 85% Power, N = 73 / Tx Group required diff is
(7.00*sqrt(64/73)) = 6.55 mm Hg With 90% Power, N = 86 / Tx Group required diff is
(7.00*sqrt(64/86)) = 6.03 mm Hg
Effect on Statistical Significance
Increasing Sample Size decreases the minimum difference required to show statistical significance (driven by H0) and therefore increases power.
Required sample size increases with: Increasing: Power, SD, Randomization Ratio Decreasing: Alpha, Assumed Difference