Inference for a Single Population Proportion ( p )

33
Inference for a Single Population Proportion (p)

description

Inference for a Single Population Proportion ( p ). ^. p = estimate of population proportion p = proportion of sample having a specified attribute =. x n. where n = sample size and x = the observed number of “successes” in the sample. - PowerPoint PPT Presentation

Transcript of Inference for a Single Population Proportion ( p )

Page 1: Inference for a Single Population Proportion ( p )

Inference for a Single Population Proportion (p)

Page 2: Inference for a Single Population Proportion ( p )

Sampling Distribution of the Sample Proportion

wherewhere n =n = sample size and sample size and x =x = the observed number of the observed number of “successes” in the sample“successes” in the sample

pp == estimate of population proportionestimate of population proportion pp== proportion of sample having a proportion of sample having a specified attributespecified attribute

==

xx

nn

sample tosample from variesand random is p̂

Sample Proportion

Page 3: Inference for a Single Population Proportion ( p )

Sampling Distribution of the Sample ProportionThe histograms below show the estimated sampling distribution of the sample proportion based upon 1000 samples of the drawn for the given sample size (n).

For larger samples, the sampling distribution of the sample proportion is approximately normal.

Page 4: Inference for a Single Population Proportion ( p )

Sampling Distribution of the Sample Proportion

Sampling Distribution of p̂

1. Provided n is “sufficiently large”, the sampling distribution is normal, generally

2. The mean of the sampling distribution is p = true population proportion

3. Standard deviation of the sampling distribution or the standard error of sample proportion is given by:

10)1( and 10 ** pnnp

n

pppSE

)1()ˆ(

* 5 is also used.

n

pppNp

)1(,~ˆ :Notation

Page 5: Inference for a Single Population Proportion ( p )

Implications for Inference(n “large”)

CI for Population Proportion (p)

Test Statistic (Ho: p = po)

n

)p̂-(1p̂ value)-z(ˆ p

z-values from standard normal90% = 1.64595% = 1.96099% = 2.578

normal standard~)1(

ˆ

n

pp

ppz

oo

o

Effect Size

)1(

|ˆ|

oo

o

pp

ppd

Page 6: Inference for a Single Population Proportion ( p )

Hypothesis Testing for a Population Proportion (p)

Null Hypothesis Ho: p = po

Alternative Hypotheses p-value

(upper-tailed)

z

P-value

(lower-tailed)

z

P-value

Page 7: Inference for a Single Population Proportion ( p )

Hypothesis Testing for a Population Proportion (p)

Null Hypothesis Ho: p = po

Alternative Hypotheses p-value

(two-tailed)

z- z

For two-tailed tests in general it is preferable to simply construct a confidence interval for the parameter of interest and see if the hypothesized value under the null hypothesis is contained in the CI. If it is we fail to reject Ho and if it is not then we reject Ho.

Page 8: Inference for a Single Population Proportion ( p )

Example: Treatment of Kidney Cancer

• Historically, one in five kidney cancer patients survive 5 years past diagnosis, i.e. 20%.

• An oncologist using an experimental therapy treats n = 40 kidney cancer patients and 16 of them survive at least 5 years.

• Is there evidence that patients receiving the experimental therapy have a higher 5-year survival rate?

Page 9: Inference for a Single Population Proportion ( p )

Step 1: Formulate Hypotheses

p = the proportion of kidney cancer patients receiving the experimental therapy that survive at least 5 years.

better) is rate survival yr.-(5 20.:

better)not is rate survival yr.-(5 20.:

pH

pH

A

o

Step 2: Determine test criteriaChoose may want to consider smaller?)Use large sample test ? Definitely questionable as we have…

np = (40)(.20) = 8 > 5 and n(1-p) = (40)(.80) = 32 > 5

Page 10: Inference for a Single Population Proportion ( p )

Step 3: Collect data and compute test statistic

16.3

40)20.1(20.

20.40.

)1(

ˆz

rate survival yr.-5 40% aor 40.40

16ˆ

n

pp

pp

p

oo

o

The observed 5-yr. survival rate for kidney cancer patients undergoing the experimental therapy is 3.16 standard errors above the historical rate!

Page 11: Inference for a Single Population Proportion ( p )

Step 4: Compute p-value

z = 3.16

P-value = .0008

Step 5: Make Decision and Interpret

We have very strong evidence to suggest the 5-year survival rate for kidney cancer patients undergoing the experimental therapy is greater than the current 5-yr. survival rate of 20% (p = .0008).

It is highly unlikely we would obtain a 40% 5-yr. survival rate in our sample, if in fact the 5-yr. survival rate for the population of patients treated with the experimental therapy was truly 20%.

Page 12: Inference for a Single Population Proportion ( p )

Step 6: Quantify significant results

55%) , (25%or )55,.25(.

15.40.

)077)(.96.1(40.40

)60)(.40(.96.140.

)ˆ1(ˆ96.1ˆ

n

ppp

Confidence Interval

Effect Size

large. as twiceis

therapyalexperiment theusing rate success the

given surprisingnot isch effect whi largeA very

25.1)80)(.20(.

|20.40.|

d

Page 13: Inference for a Single Population Proportion ( p )

Power Calculation

Baseline proportion is the proportion under the null hypothesis (.20 or 20% here)

Difference to detect is the absolute difference between p under alternative and p under null (i.e., .40 - .20 = .20)

Power is calculated once sample size and difference information is entered (here, Power = .935).

For power use software

Page 14: Inference for a Single Population Proportion ( p )

Power Curve for n = 40 and po = .20

For a difference of .20Power = .935 as seen on previous slide

Page 15: Inference for a Single Population Proportion ( p )

Sample Size and CI’s for p

• Suppose we wish to estimate p using a 95% CI and have a margin of error of 3%. What sample size do we need to use?

• Recall the CI for p is given by:

n

)p̂-(1p̂ value)-z(ˆ p

MARGIN OF ERROR (E)

Page 16: Inference for a Single Population Proportion ( p )

Sample Size and CI’s for p

• Here for a 95% CI we want E = .03 or 3%

• After some wonderful algebraic manipulation

03.n

)p̂-(1p̂96.1 E

2

2 )ˆ1(ˆ96.1

E

ppn

Oh, oh! We don’t know p-hat !!

1. “Guesstimate”

2. Use p-hat from pilot or prior study.

3. Largest n we would ever need comes when p-hat = .50.

Page 17: Inference for a Single Population Proportion ( p )

Sample Size and CI’s for p

1. Informed approach

2. Conservative approach (i.e. worst case scenario)

knowledgeprior from ˆ )ˆ1(ˆ96.1

2

2

pE

ppn

.50 ˆ uses 4

96.12

2

pE

nStandard normal values90% = 1.64595% = 1.96099% = 2.578

Page 18: Inference for a Single Population Proportion ( p )

Sample Size and CI’s for p• Original Question: Suppose we wish to estimate p

using a 95% CI and have a margin of error of 3%. What sample size do we need to use?

• Assume that we estimate the 5 yr. survival rate for a new kidney cancer therapy, and we know historical that it this survival rate is around 20%.

• Using informed approach

subjects 68395.68203.

)80)(.20(.96.1)ˆ1(ˆ96.12

2

2

2

nE

ppn

Page 19: Inference for a Single Population Proportion ( p )

Sample Size and CI’s for p• Original Question: Suppose we wish to estimate p

using a 95% CI and have a margin of error of 3%. What sample size do we need to use?

• Assume that we estimate the 5 yr. survival rate for a new kidney cancer therapy, and we know historical that it this survival rate is around 20%.

• Using conservative approach

subjects 10681.1067)03(.4

96.1

4

96.12

2

2

2

nE

n

This is why in media polls you they usually report a sampling error of + 3% and that the poll was based on a sample of n = 1000 individuals.

Page 20: Inference for a Single Population Proportion ( p )

Small Sample Inference for p: Binomial Exact Test

• When the sample size is “small” the sampling distribution cannot be approximated by a standard normal distribution.

• However, regardless of sample size the EXACT sampling distribution of the number of “successes” in n independent trials ALWAYS has a Binomial Distribution.

• Thus if we knew more about the binomial distribution we could use it to find p-values when conducting a hypothesis test and also when constructing confidence intervals for the pop. proportion p.

Page 21: Inference for a Single Population Proportion ( p )

Binomial Probability DistributionA binomial random variable X is defined to the number

of “successes” in n independent trials where the P(“success”) = p is constant.

Notation: X ~ BIN(n,p)

In the definition above notice the following conditions need to be satisfied for a binomial experiment:

1. There is a fixed number of n trials carried out.2. The outcome of a given trial is either a “success”

or “failure”.3. The probability of success (p) remains constant

from trial to trial. 4. The trials are independent, the outcome of a trial is

not affected by the outcome of any other trial.

Page 22: Inference for a Single Population Proportion ( p )

Binomial Distribution

• If X ~ BIN(n, p), then

• where

.,...,1,0 )1()!(!

! )1()( nxpp

xnx

npp

x

nxXP xnxxnx

psuccessP

nx

nnnn

)"("

trials.in successes""

obtain to waysofnumber the x"choosen " x

n

1 1! and 1 0! also ,1...)2()1(!

Page 23: Inference for a Single Population Proportion ( p )

Binomial Distribution

• If X ~ BIN(n, p), then

• E.g. when n = 3 and p = .50 there are 8 possible equally likely outcomes (e.g. flipping a coin)

SSS SSF SFS FSS SFF FSF FFS FFF

X=3 X=2 X=2 X=2 X=1 X=1 X=1 X=0

P(X=3)=1/8, P(X=2)=3/8, P(X=1)=3/8, P(X=0)=1/8• Now let’s use binomial probability formula instead…

.,...,1,0 )1()!(!

! )1()( nxpp

xnx

npp

x

nxXP xnxxnx

Page 24: Inference for a Single Population Proportion ( p )

Binomial Distribution

• If X ~ BIN(n, p), then

• E.g. when n = 3, p = .50 find P(X = 2)

.,...,1,0 )1()!(!

! )1()( nxpp

xnx

npp

x

nxXP xnxxnx

83or 375.)5)(.5(.3)5(.5.

2

3)2(

ways31)12(

123

!1 !2

!3

)!23(!2

!3

2

3

12232

XP

SSF

SFS

FSS

Page 25: Inference for a Single Population Proportion ( p )

Example: Treatment of Kidney Cancer

• In our example we had n = 40 patients and if we assume the experimental therapy is no better than current treatments then probability of 5-year survival is p = .20.

• Thus the number of patients in our study surviving at least 5 years has a binomial distribution, i.e. X ~ BIN(40,.20).

Page 26: Inference for a Single Population Proportion ( p )

Example: Treatment of Kidney Cancer • X ~ BIN(40,.20), find the probability that exactly 16

patients survive at least 5 years.

• This requires some calculator gymnastics and some scratchwork! Also, keep in mind for a p-value we need to find the probability of having 16 or more patients surviving at least 5 yrs.

• Remember p-value is defined as evidence as extreme or more extreme.

001945.80.20.16

40)16( 2416

XP

Page 27: Inference for a Single Population Proportion ( p )

Example: Treatment of Kidney Cancer

• So we actually need to find:

p-value = P(X = 16) + P(X = 17) + … + P(X = 40)

+

+

EXACT p-value = .002936 YIPES!

001945.80.20.16

40)16( 2416

XP

000686.80.20.17

40)17( 2317

XP

080.20.40

40)40( 040

XP

Page 28: Inference for a Single Population Proportion ( p )

Example: Treatment of Kidney Cancer

• X ~ BIN(40,.20), find the probability that 16 or more patients survive at least 5 years.

• USE COMPUTER!• Binomial Exact Test p-value calculator in JMP

Enter n = sample sizex = observed # of “successes”po = proportion under Ho

p-values are computed automatically for all three possible alternatives

Page 29: Inference for a Single Population Proportion ( p )

Example: Treatment of Kidney Cancer

• X ~ BIN(40,.20), find the probability that 16 or more patients survive at least 5 years.

• USE COMPUTER!• Binomial Exact Test p-value calculator in JMP

Exact p-value = .0029362Contrasting this EXACT p-value (p = .0029) to the one calculated earlier using the normal approximation (p = .0008) we see a fairly substantial difference! MORAL: USE EXACT WHEN n is SMALL !!!

Page 30: Inference for a Single Population Proportion ( p )

Exact CI for p using the binomial distribution

• Find LCL and UCL for p by finding probabilities that meet the following requirements:

P(X > x|p = LCL) = and

P(X < x|p = UCL) =

• Use computer to find these probabilities.

e.g. for 95% confidence

Page 31: Inference for a Single Population Proportion ( p )

Exact CI for p using the binomial distribution

• Find a 95% CI for p for the kidney cancer study

For the lower confidence limit we find LCL = .248 or .249

Page 32: Inference for a Single Population Proportion ( p )

Exact CI for p using the binomial distribution

• Find a 95% CI for p for the kidney cancer study

For the upper confidence limit we find UCL = .566 or .567

Therefore based on an EXACT 95% confidence interval we estimate that the success rate of the experimental therapy is between 24.8% and 56.7%.

Page 33: Inference for a Single Population Proportion ( p )

Summary of Inference for a Single Population Proportion (p)

• When n is “large” use large sample methods based on the sampling distribution being approximately normal. (Easy)

• When n is “small” use exact methods based on the binomial distribution, which requires specific software or tables. (Hard)

• Exact methods can always be used!• In general, precise estimates of a population

proportion requires a large samples size, e.g. media polls which typically use n = 1,000.