Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals...

Fall 2002 Biostat 511 1

Statistical Inference - Proportions

One sample•Confidence intervals•Hypothesis tests

Two Sample•Confidence intervals•Hypothesis tests


Confidence Intervals -Binomial Proportion

Recall that if n is “large” (np > 10 and n(1-p) > 10) we can approximate the binomial distribution by a normal:

X ~ N(np,np(1-p))

or, equivalently,

n

p)p(1p,N~

n

Xp̂

This suggests the following confidence interval for p:

n

)p̂-(1p̂Qp̂ )-(1

Z2


Hypothesis Testing1-sample Tests: Binomial Proportion

When it is appropriate to use the normal approximation to the binomial, a test of the success probability, p, based on a binomial variable X, looks just like the Z-test (see homework for the case when np is small, so normal approximation is not appropriate)

Example:

Suppose that there is an equal chance that a child is male or female. We find in a sample of 114 workers at a pesticide plant (with only one child) that 66 of the children are female. Is this evidence that these workers are more likely to have girls?

Define: p = probability that worker has daughter

H0 : p = 0.5

HA : p > 0.5


Hypothesis Testing1-sample Tests: Binomial Proportion

Data: X = 66, n = 114

X ~ N ( np, np(1-p) )

Under H0 : p0 = 0.5

For this one-sided alternative we reject H0 if

The critical value for a one-sided = 0.05 test is =1.65. Since the test statistic, Z = 1.69, exceeds the critical value, we reject H0.

69.134.5

9

5.015.0114

5.011466

1 00

0

ppn

pnXZ

1ZQ ZifReject

05.1ZQ

Repeat this using in place of X to convince yourself it is equivalent!

p̂


Confidence interval? (use instead of po in the standard error)

p̂

n

)p̂-(1p̂Qp̂ 2-1

Z


Hypothesis Testing2 Sample Proportions - Motivation

In a study of the morbidity and mortality among pediatric victims of motor vehicle accidents, information was gathered on whether children were wearing seat belts at the time of the accident (Osberg and DiScala, AJPH 1992). Of the 123 children who were wearing a seat belt, 3 died, while of the 290 children found not wearing a seat belt, 13 died.

Q: Does this study show that wearing a seat belt is beneficial?

Let’s construct a hypothesis test to assess the two fatality probabilities.


Hypothesis Testing2 Sample Proportions

What are the parameters and the statistics for this question?

p1 = mortality for children when seat belt is wornp2 = mortality for children when seat belt is not wornX1 = # of recorded deaths among those who wear

seat beltsn1 = # of children surveyed from those that wear seat

beltsX2 = # of recorded deaths among those who do not

wear seat beltsn2 = # of children surveyed from those that no not

wear seat belts

Group 1 Group 2Population Proportion p1 p2

Sample Proportion

sample size

p1=X1/n1

n1

p2=X2/n2

n2



The hypothesis that the two populations are the same is addressed by the hypotheses:

H0 : p1 = p2

HA : p1 p2

A statistic useful for this comparison is the difference in the observed, or sample, proportions (we’ll see some others later):

Q: What is the distribution of this statistic?

A: Approximately normal.

21 ˆˆ pp

2

22

1

1122121

11 ,~ˆˆ

n

pp

n

ppppNpp



We obtain a standard normal (approximately) if we use:

Then under the null, H0 : p1 = p2 = p0, we obtain

However, we still don’t know the common estimate, p0, in the denominator…

We need to replace p0 with an estimate - just like pooled variances, , we obtain a weighted average as the estimate used in the variance.

2

22

1

11

2121

11ˆˆ

npp

npp

ppppZ

2

00

1

00

21

110ˆˆ

npp

npppp

Z

2ps

21

21

221111

0 ˆˆ1

ˆ

nn

XX

pnpnnn

p


The test statistic used for testing H0: p1 = p2 is:

Note: The test is still valid if we had simply used the separate estimates, and , instead of the common estimate based on H0.

Note: A common estimate isn’t used when confidence intervals are computed for the difference in the population proportions, p1 - p2. In this case, we don’t have any assumption regarding the relationship between p1 and p2 so use the following as a 95% CI for p1 - p2

2

00

1

00

21

ˆ1ˆˆ1ˆ0ˆˆ

npp

npppp

Z

1p̂ 2p̂

2

22

1

1121

ˆ1ˆˆ1ˆ96.1ˆˆ

npp

npp

pp


Returning to the example…

So we estimate the separate risks as

Thus it appears that the risk is nearly twice as high for the seat belt - group. We can test H0: p1 = p2 but we first need a common estimate (under the null):

We use the statistic

Since |Z| < 1.96 we fail to reject H0 and conclude that the observed difference is not statistically significant at the 0.05 level.


Group Fatalities Sample Size12

seat belts +seat belts -

313

123290

045.0290/13ˆ

024.0123/3ˆ

2

1

p

p

039.0413/16290123/133ˆ 0 p

01.1

2901

1231

039.01039.0

045.0024.0

Z



Note, under the null we would have expected:

n1 x p0 = 0.039 X 123 = 4.73 deaths in the seat belt + group.

n2 x p0 = 0.039 X 290 = 11.15 deaths in the seat belt - group.

The fact that one of these is “small” (less than 5) causes some concern about the normal approximation to the binomial. One alternative in this case is known as Fisher’s Exact Test - which does not make the normality assumption.


Power and Sample Size

• Power = Pr(reject H0 | Ha true)• Power depends on 0, a, 2, and n.• Sample size calculations ensure that the study is capable of detecting

departures from the null hypothesis.• Power and Sample size require a model for the data under both the

null and the alternative is required.


We have mainly focused on the distribution of the test statistic under the null hypothesis. Shouldn’t we also consider the distribution under the alternative hypothesis?

Yes!

• The distribution of the test statistic under the alternative hypothesis tells us the power of the test.

• Power indicates the ability of the test procedure to reliably detect departures from the null hypothesis.

• Power (1 - ) and significance () are important considerations in the planning of a study.

• Sample size calculations.

Power and Sample Size


Power

Power refers to the probability of rejecting the null hypothesis when it truly is false:

1 - = P [ reject H0 | HA true ]

So when we consider power we compute probabilities assuming now that the alternative is the “truth”.

Consider the 1-sample testing situation with the hypotheses:

H0 : = 0

HA : 0

The scenario is enough to illustrate all of the important concepts. Details change when we consider variants:

1. One-sided alternatives

2. 2-sample problems

3. Sample proportions

We will look at the 1-sample case in detail and give results for the other situations.


Power

Power refers to the likelihood of detecting a difference from H0. Clearly the greater the difference between the null mean, 0, and the alternative mean, A, the more likely the sample mean will be “significantly far from H0”. The idea here is that when the distance between the mean under the null and the mean under the alternative, |0 - A| is “large” then the power is also large. However, as usual, we measure “large” in terms of SEM units (standard error of the mean),

Define: = | 0 - A |

Then what we want “large” to give good power is

Based on this, we expect power to increase as:1. Sample size increases.2. Distance between 0 and A () increases.3. Variance gets smaller.

.2 n

n

n/


Power

Q: How can we compute the power?

We assume that is known. Even if we don’t know it, we’ll need an estimate of it. Also, we know the sample size, n.

1. Choose

2. Identify 0

3. Determine 1-sided / 2-sided

4. Identify A

(1) - (3) determine what the rejection region will be. For example, a two-sided test of

H0: 0

REJECT H0: nQX

nQX

Z

Z

/or

/

20

21

0


Power example µ0 = 13.0, µA = 12.8, = 0.7, =.05

n = 25

n = 100


Power

Let’s assume that A < 0. Then, only the lower rejection region has non-negligible probability. (4) determines the distribution under the alternative so that power can be computed:

So here we find the key quantity that determines the power is indeed

nQZ

nQ

nQ

AZ

AZ

AZ

/P

|/

ZP

|/XP

trueH|H REJECTP1

02

02

20

A0

n

(Note: if A > 0 then power depends on A - 0)


Power

Let’s come up for air We have shown that the POWER can be written as

This tells us the probability of rejecting H0 when the alternative is true. This is important!! Why spend $$$ on a study, that hopes to show a treatment effect, if the probability of rejecting H0 is small? In fact, in the acquisition of $$$ from NIH one must show that the study is capable (sufficient power) of detecting a meaningful difference.

One useful display is the Power Curve. This shows the power for different A.

nQZ AZ /

P 02


Suppose we are testing blood pressure medication and know that average systolic blood pressure among hypertensives is 150 with a standard deviation of 10. For various treatment effects, measured as decreases, d, how likely are we to reject the null hypothesis (2-sided at = 0.05) that

H0 : d = 0

Depends on sample size and the alternative - Power Curve!

Power Curve


Sample Size

Q: How many patients should we treat?

1. Specify significance level ()

2. Specify null mean (0)

3. Specify power (1 - )

4. Specify the alternative mean (A)

5. Specify the variance ()

The real work is coming up with (1) - (5). All of this gives the results in the following (2) requirements:

Significance :

Power :

Choose the sample size n that satisfies the 2 conditions:

AZ

Z

Qn

XP

Qn

XP

true||/

|1

true||/

|

21

0

02

10

20

2

21

12

A

ZZ QQ

n


Sample Size

Suppose that we consider a decrease in blood pressure of 5mm Hg to be scientifically important. Then given the standard deviation, 10, how many patients are required to obtain 80% power using a 2-sided = 0.05 test?

So we need

•Recruit n = 32 patients into our study.

84.0

96.180.0

975.0

Z

Z

Q

Q

36.315

96.184.0102

22

20

2

21

12

A

ZZ QQ

n


Sample Size


Factors that Influence Sample Size

The required sample size increases as:

1. 2 increases

2. significance level is made smaller ( decreases)

3. power increases (1 - increases)

4. distance | 0 - A | decreases


Sample Size

1-sample Mean, 1-sided Test

1-sample Proportion, 1-sided Test

1-sample Proportion, 2-sided Test

20

2112

A

ZZ QQn

20

2

00

1100 1

11

A

AAZZ

pp

pppp

QQpp

n

20

2

00

121

00 11

1

A

AAZZ

pp

pppp

QQpp

n


Sample Size

2-sample Mean, 2-sided Test, Equal sizes:n1 = n2 = n

2-sample Proportion, 2-sided Test, Equal sizes:n1 = n2 = n(rough approximation - for p0 pA)

20

2

21

122

21

A

ZZ QQ

n

20

2

21

100 ]11[

A

ZZAA

pp

QQpppp

n


Summary

• Power is an important component in study design.

• Sample size calculations ensure that the study is capable of detecting departures from the null hypothesis.

• Power and Sample size require more than test - a model for the data under both the null and the alternative is required.

• STRUTs (=.05; power = .80; 2-tailed)

*One sample: n = 8/D2

*Two sample: n = 16/D2

where D = / (difference in std. dev. units)

Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals...

Documents

Transcript of Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals...