Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals...
-
Upload
nathaniel-bailey -
Category
Documents
-
view
224 -
download
0
Transcript of Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals...
Fall 2002 Biostat 511 1
Statistical Inference - Proportions
One sample•Confidence intervals•Hypothesis tests
Two Sample•Confidence intervals•Hypothesis tests
Fall 2002 Biostat 511 2
Confidence Intervals -Binomial Proportion
Recall that if n is “large” (np > 10 and n(1-p) > 10) we can approximate the binomial distribution by a normal:
X ~ N(np,np(1-p))
or, equivalently,
n
p)p(1p,N~
n
Xp̂
This suggests the following confidence interval for p:
n
)p̂-(1p̂Qp̂ )-(1
Z2
Fall 2002 Biostat 511 3
Hypothesis Testing1-sample Tests: Binomial Proportion
When it is appropriate to use the normal approximation to the binomial, a test of the success probability, p, based on a binomial variable X, looks just like the Z-test (see homework for the case when np is small, so normal approximation is not appropriate)
Example:
Suppose that there is an equal chance that a child is male or female. We find in a sample of 114 workers at a pesticide plant (with only one child) that 66 of the children are female. Is this evidence that these workers are more likely to have girls?
Define: p = probability that worker has daughter
H0 : p = 0.5
HA : p > 0.5
Fall 2002 Biostat 511 4
Hypothesis Testing1-sample Tests: Binomial Proportion
Data: X = 66, n = 114
X ~ N ( np, np(1-p) )
Under H0 : p0 = 0.5
For this one-sided alternative we reject H0 if
The critical value for a one-sided = 0.05 test is =1.65. Since the test statistic, Z = 1.69, exceeds the critical value, we reject H0.
69.134.5
9
5.015.0114
5.011466
1 00
0
ppn
pnXZ
1ZQ ZifReject
05.1ZQ
Repeat this using in place of X to convince yourself it is equivalent!
p̂
Fall 2002 Biostat 511 5
Confidence interval? (use instead of po in the standard error)
p̂
n
)p̂-(1p̂Qp̂ 2-1
Z
Fall 2002 Biostat 511 6
Hypothesis Testing2 Sample Proportions - Motivation
In a study of the morbidity and mortality among pediatric victims of motor vehicle accidents, information was gathered on whether children were wearing seat belts at the time of the accident (Osberg and DiScala, AJPH 1992). Of the 123 children who were wearing a seat belt, 3 died, while of the 290 children found not wearing a seat belt, 13 died.
Q: Does this study show that wearing a seat belt is beneficial?
Let’s construct a hypothesis test to assess the two fatality probabilities.
Fall 2002 Biostat 511 7
Hypothesis Testing2 Sample Proportions
What are the parameters and the statistics for this question?
p1 = mortality for children when seat belt is wornp2 = mortality for children when seat belt is not wornX1 = # of recorded deaths among those who wear
seat beltsn1 = # of children surveyed from those that wear seat
beltsX2 = # of recorded deaths among those who do not
wear seat beltsn2 = # of children surveyed from those that no not
wear seat belts
Group 1 Group 2Population Proportion p1 p2
Sample Proportion
sample size
p1=X1/n1
n1
p2=X2/n2
n2
Fall 2002 Biostat 511 8
Hypothesis Testing2 Sample Proportions
The hypothesis that the two populations are the same is addressed by the hypotheses:
H0 : p1 = p2
HA : p1 p2
A statistic useful for this comparison is the difference in the observed, or sample, proportions (we’ll see some others later):
Q: What is the distribution of this statistic?
A: Approximately normal.
21 ˆˆ pp
2
22
1
1122121
11 ,~ˆˆ
n
pp
n
ppppNpp
Fall 2002 Biostat 511 9
Hypothesis Testing2 Sample Proportions
We obtain a standard normal (approximately) if we use:
Then under the null, H0 : p1 = p2 = p0, we obtain
However, we still don’t know the common estimate, p0, in the denominator…
We need to replace p0 with an estimate - just like pooled variances, , we obtain a weighted average as the estimate used in the variance.
2
22
1
11
2121
11ˆˆ
npp
npp
ppppZ
2
00
1
00
21
110ˆˆ
npp
npppp
Z
2ps
21
21
221111
0 ˆˆ1
ˆ
nn
XX
pnpnnn
p
Fall 2002 Biostat 511 10
The test statistic used for testing H0: p1 = p2 is:
Note: The test is still valid if we had simply used the separate estimates, and , instead of the common estimate based on H0.
Note: A common estimate isn’t used when confidence intervals are computed for the difference in the population proportions, p1 - p2. In this case, we don’t have any assumption regarding the relationship between p1 and p2 so use the following as a 95% CI for p1 - p2
2
00
1
00
21
ˆ1ˆˆ1ˆ0ˆˆ
npp
npppp
Z
1p̂ 2p̂
2
22
1
1121
ˆ1ˆˆ1ˆ96.1ˆˆ
npp
npp
pp
Fall 2002 Biostat 511 11
Returning to the example…
So we estimate the separate risks as
Thus it appears that the risk is nearly twice as high for the seat belt - group. We can test H0: p1 = p2 but we first need a common estimate (under the null):
We use the statistic
Since |Z| < 1.96 we fail to reject H0 and conclude that the observed difference is not statistically significant at the 0.05 level.
Hypothesis Testing2 Sample Proportions
Group Fatalities Sample Size12
seat belts +seat belts -
313
123290
045.0290/13ˆ
024.0123/3ˆ
2
1
p
p
039.0413/16290123/133ˆ 0 p
01.1
2901
1231
039.01039.0
045.0024.0
Z
Fall 2002 Biostat 511 12
Hypothesis Testing2 Sample Proportions
Note, under the null we would have expected:
n1 x p0 = 0.039 X 123 = 4.73 deaths in the seat belt + group.
n2 x p0 = 0.039 X 290 = 11.15 deaths in the seat belt - group.
The fact that one of these is “small” (less than 5) causes some concern about the normal approximation to the binomial. One alternative in this case is known as Fisher’s Exact Test - which does not make the normality assumption.
Fall 2002 Biostat 511 13
Power and Sample Size
• Power = Pr(reject H0 | Ha true)• Power depends on 0, a, 2, and n.• Sample size calculations ensure that the study is capable of detecting
departures from the null hypothesis.• Power and Sample size require a model for the data under both the
null and the alternative is required.
Fall 2002 Biostat 511 14
We have mainly focused on the distribution of the test statistic under the null hypothesis. Shouldn’t we also consider the distribution under the alternative hypothesis?
Yes!
• The distribution of the test statistic under the alternative hypothesis tells us the power of the test.
• Power indicates the ability of the test procedure to reliably detect departures from the null hypothesis.
• Power (1 - ) and significance () are important considerations in the planning of a study.
• Sample size calculations.
Power and Sample Size
Fall 2002 Biostat 511 15
Power
Power refers to the probability of rejecting the null hypothesis when it truly is false:
1 - = P [ reject H0 | HA true ]
So when we consider power we compute probabilities assuming now that the alternative is the “truth”.
Consider the 1-sample testing situation with the hypotheses:
H0 : = 0
HA : 0
The scenario is enough to illustrate all of the important concepts. Details change when we consider variants:
1. One-sided alternatives
2. 2-sample problems
3. Sample proportions
We will look at the 1-sample case in detail and give results for the other situations.
Fall 2002 Biostat 511 16
Power
Power refers to the likelihood of detecting a difference from H0. Clearly the greater the difference between the null mean, 0, and the alternative mean, A, the more likely the sample mean will be “significantly far from H0”. The idea here is that when the distance between the mean under the null and the mean under the alternative, |0 - A| is “large” then the power is also large. However, as usual, we measure “large” in terms of SEM units (standard error of the mean),
Define: = | 0 - A |
Then what we want “large” to give good power is
Based on this, we expect power to increase as:1. Sample size increases.2. Distance between 0 and A () increases.3. Variance gets smaller.
.2 n
n
n/
Fall 2002 Biostat 511 17
Power
Q: How can we compute the power?
We assume that is known. Even if we don’t know it, we’ll need an estimate of it. Also, we know the sample size, n.
1. Choose
2. Identify 0
3. Determine 1-sided / 2-sided
4. Identify A
(1) - (3) determine what the rejection region will be. For example, a two-sided test of
H0: 0
REJECT H0: nQX
nQX
Z
Z
/or
/
20
21
0
Fall 2002 Biostat 511 18
Power example µ0 = 13.0, µA = 12.8, = 0.7, =.05
n = 25
n = 100
Fall 2002 Biostat 511 19
Power
Let’s assume that A < 0. Then, only the lower rejection region has non-negligible probability. (4) determines the distribution under the alternative so that power can be computed:
So here we find the key quantity that determines the power is indeed
nQZ
nQ
nQ
AZ
AZ
AZ
/P
|/
ZP
|/XP
trueH|H REJECTP1
02
02
20
A0
n
(Note: if A > 0 then power depends on A - 0)
Fall 2002 Biostat 511 20
Power
Let’s come up for air We have shown that the POWER can be written as
This tells us the probability of rejecting H0 when the alternative is true. This is important!! Why spend $$$ on a study, that hopes to show a treatment effect, if the probability of rejecting H0 is small? In fact, in the acquisition of $$$ from NIH one must show that the study is capable (sufficient power) of detecting a meaningful difference.
One useful display is the Power Curve. This shows the power for different A.
nQZ AZ /
P 02
Fall 2002 Biostat 511 21
Suppose we are testing blood pressure medication and know that average systolic blood pressure among hypertensives is 150 with a standard deviation of 10. For various treatment effects, measured as decreases, d, how likely are we to reject the null hypothesis (2-sided at = 0.05) that
H0 : d = 0
Depends on sample size and the alternative - Power Curve!
Power Curve
Fall 2002 Biostat 511 22
Sample Size
Q: How many patients should we treat?
1. Specify significance level ()
2. Specify null mean (0)
3. Specify power (1 - )
4. Specify the alternative mean (A)
5. Specify the variance ()
The real work is coming up with (1) - (5). All of this gives the results in the following (2) requirements:
Significance :
Power :
Choose the sample size n that satisfies the 2 conditions:
AZ
Z
Qn
XP
Qn
XP
true||/
|1
true||/
|
21
0
02
10
20
2
21
12
A
ZZ QQ
n
Fall 2002 Biostat 511 23
Sample Size
Suppose that we consider a decrease in blood pressure of 5mm Hg to be scientifically important. Then given the standard deviation, 10, how many patients are required to obtain 80% power using a 2-sided = 0.05 test?
So we need
•Recruit n = 32 patients into our study.
84.0
96.180.0
975.0
Z
Z
Q
Q
36.315
96.184.0102
22
20
2
21
12
A
ZZ QQ
n
Fall 2002 Biostat 511 24
Sample Size
Fall 2002 Biostat 511 25
Factors that Influence Sample Size
The required sample size increases as:
1. 2 increases
2. significance level is made smaller ( decreases)
3. power increases (1 - increases)
4. distance | 0 - A | decreases
Fall 2002 Biostat 511 26
Sample Size
1-sample Mean, 1-sided Test
1-sample Proportion, 1-sided Test
1-sample Proportion, 2-sided Test
20
2112
A
ZZ QQn
20
2
00
1100 1
11
A
AAZZ
pp
pppp
QQpp
n
20
2
00
121
00 11
1
A
AAZZ
pp
pppp
QQpp
n
Fall 2002 Biostat 511 27
Sample Size
2-sample Mean, 2-sided Test, Equal sizes:n1 = n2 = n
2-sample Proportion, 2-sided Test, Equal sizes:n1 = n2 = n(rough approximation - for p0 pA)
20
2
21
122
21
A
ZZ QQ
n
20
2
21
100 ]11[
A
ZZAA
pp
QQpppp
n
Fall 2002 Biostat 511 28
Summary
• Power is an important component in study design.
• Sample size calculations ensure that the study is capable of detecting departures from the null hypothesis.
• Power and Sample size require more than test - a model for the data under both the null and the alternative is required.
• STRUTs (=.05; power = .80; 2-tailed)
*One sample: n = 8/D2
*Two sample: n = 16/D2
where D = / (difference in std. dev. units)