Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit /...
Transcript of Week 2 a Con dence intervals b Hypothesis testing for proportions · 2011-10-31 · 150 Ernst Wit /...
Week 2a Confidence intervals
b Hypothesis testing for proportions
Ernst Wit / Wim Krijnen
Johann Bernoulli Institute
http://www.math.rug.nl/∼ernst
Ernst Wit / Wim Krijnen Week 2
Estimation
Remember:
Statistical inference is the business of calculatingstatistics on variables in a sample in order saysomething about parameters in a population.
One statistic we like to calculate is an estimator.
Example. In a country two candidates are involved in apresidential election, Pal en Oba. The country has 10,000inhabitants, of which 5,600 are in favour of Oba and 4,400 infavour of Pal.
I Parameter: p = 0.56 (true fraction of those in favour of Oba)
I Sample: news organization samples 100 people for their votes.
I Statistic: those in favour of Oba100 (Estimator).
Ernst Wit / Wim Krijnen Week 2
How good is the estimator?
I Let’s assume there are 1,000 news organizations and eachperform their own independent poll.
I None of them will be certain to get it right, but their answerswill vary around the truth p = 0.56:
Ernst Wit / Wim Krijnen Week 2
7.1 Confidence interval
> country<-c(rep(1,5600),rep(0,4400))
> phat <- vector(’numeric’,1000)
> for (i in 1:1000){
+ phat[i] <- mean(sample(country,size=100,replace=TRUE))}
Quantiles of 1,000 ps:
> quantile(phat,c(0.1,0.9))
10% 90%
0.50 0.62
> quantile(phat,c(0.025,0.975))
2.5% 97.5%
0.46 0.66
Ernst Wit / Wim Krijnen Week 2
Above results suggest
P(0.50 ≤ p ≤ 0.62) ≈ 0.80,
P(0.46 ≤ p ≤ 0.66) ≈ 0.95.
Since true p = .56
P(0.50− p ≤ p − p ≤ 0.62− p) = .80
P(−0.06 ≤ p − p ≤ 0.06) = .80
Distance between p and p less than 0.06 with 80% certaintyEquivalently
P(p − 0.06 ≤ p ≤ p + 0.06) = .80
“True p within interval (p − 0.06, p + 0.06) with 80% confidence”80% confidence interval
Ernst Wit / Wim Krijnen Week 2
Deriving some theory...
What can we say about the approximate distribution of p:
1. Approximately normal:
2. It is approximately unbiased: E p = p
> mean(phat)
[1] 0.56047
3. It has a tractable standard deviation: SD(p) =√
p(1−p)n
> sqrt(0.56*0.44/100)
[1] 0.04963869
> sd(phat)
[1] 0.04952708Ernst Wit / Wim Krijnen Week 2
7.2 Confidence interval for population proportion, p
We have seen that (approximately)
p ∼ N
(p,
p(1− p)
n
),
and so (approximately)
p − p
SE(p)∼ N(0, 1)
whereSE(p) =
√p(1− p/n
From normal distribution we know, for example, that
P
(−1.96 ≤ p − p
SE(p)≤ 1.96
)≈ 0.95
I interval (p − 1.96SE(p), p + 1.96SE(p) contains p withapproximate probability 0.95
I This is a 95% confidence intervalErnst Wit / Wim Krijnen Week 2
In general
α/2 = P(Z ≤ z∗), 1− α/2 = P(Z ≤ z∗)
zstar <- -qnorm(alpha/2); zstar <- qnorm(1-alpha/2)
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
z−axis
norm
al d
ensi
ty
rejectionregion
α 2
rejectionregion
α 2
acceptanceregion1 − α
zα 2− zα 2
So a (1− α)100% confidence interval for p is given as
(p − z ∗ SE(p), p + z ∗ SE(p)
Ernst Wit / Wim Krijnen Week 2
What does this mean in practice?
“How does this example really help me, because inpractice I have only one example?”
Well, let’s go back to the Election example. We saw in one sample:
p = 0.58,
and so we can calculate√0.58× 0.42
100= 0.04935585
[NOTE: not so different from 0.04952708 from 1,000 samples]So, then we can claim that a 95% CI is given by:
(0.58− 0.04935× 1.96, 0.58 + 0.04935× 1.96) = (0.483, 0.677)
Ernst Wit / Wim Krijnen Week 2
Example 7.2
466 out of 1,013 voters rate precedent performance as “good”Construct 95% confidence interval for true proportion.
> n <- 1013; alpha <- 0.05
> phat <- 466/n
> SE <- sqrt(phat*(1-phat)/n)
> zstar <- qnorm(1-alpha/2)
> round(phat + c(-1,1)*zstar*SE,4)
[1] 0.4293 0.4907
This means
The “probability” that the interval covers the truefraction is 0.95.
We say
“We are 95% confident that the true fraction liesbetween 0.4293 and 0.4907.”
Ernst Wit / Wim Krijnen Week 2
Let R do the work: prop.test
> prop.test(466,1013,conf.level=0.95)
1-sample proportions test with continuity correction
data: 466 out of 1013, null probability 0.5
X-squared = 6.3179, df = 1, p-value = 0.01195
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.4290475 0.4912989
sample estimates:
p
0.4600197
Note: Slightly different result from solving Equation 7.1 directly.
Ernst Wit / Wim Krijnen Week 2
7.5 Confidence intervals for differences of proportions
Example 7.8: Poll in Week 1 1,000 interviewed, 560 agree;Week 2 1,200 interviewed and 570 agree
Aim: to see if opinion in the population has changed.
Statistic:
Proportional change =570
1200− 560
1000= −0.085.
Is this really a change?
We use (again) the facts that:
I If n is large, then p2 − p1 is approximately normal.
I E (p2 − p1) = p2 − p1
I SE(p2 − p1) ≈√
p1(1−p1)n1
+ p2(1−p2)n2
p2 − p1 − (p2 − p1)
SE(p1 − p2)∼ N(0, 1)
Ernst Wit / Wim Krijnen Week 2
Confidence intervals for differences of proportions (2)
This means that a (1− α)100% CI for the proportion difference is:(p2 − p1 − zα/2 × SE(p2 − p1) , p2 − p1 + zα/2 × SE(p2 − p1)
).
In example 7.8:
> prop.test(x=c(560,570),n=c(1000,1200),conf.level=0.95)
data: c(560, 570) out of c(1000, 1200)
X-squared = 15.437, df = 1, p-value = 8.53e-05
95 percent confidence interval: 0.04231207 0.12768793
sample estimates: prop 1 prop 2 0.560 0.475
Conclusion: 0 not in CI; non-zero difference with 95% confidence.
Ernst Wit / Wim Krijnen Week 2
7.5.2 Difference in means
Example 7.9 Weight loss under condition placebo (Group 1)or condition drug ephedra (Group 2).
x <- c(0,0,0,2,4,5,13,14,14,14,15,17,17)
y <- c(0,6,7,8,11,13,16,16,16,17,18)
for (i in 1:1000){
x.new <- sample(x,size=length(x),replace=T)
y.new <- sample(y,size=length(y),replace=T)
dif<-c(dif,mean(x.new)-mean(y.new))}
quantile(dif,c(.025,.975))
2.5% 97.5%
-7.610140 2.168007
hist(dif)
Histogram of dif
dif
Fre
quen
cy
−10 −5 0 5
050
100
150
Ernst Wit / Wim Krijnen Week 2
General (normal) theory for comparing two means
Sample 1: X1,X2, · · · ,Xnx ∼ N(µ1, σ1) gives mean X , var. s2xSample 2: Y1,Y2, · · · ,Yny ∼ N(µ2, σ2) gives mean Y , var. s2y
Problem: Construct CI for X − Y
T =(X − Y )− (µx − µy )
SE(X − Y )∼ t − distribution with df
df =
{nx + ny − 2 if σ1 = σ2(
s2xnx
+s2yny
)2·((s2x /nx )
2
nx−1 +(s2y /ny )
2
ny−1
)−1if σ1 6= σ2
SE(X − Y ) ≈
√
s2p(1/nx + 1/ny ) if σ1 = σ2√s2x /nx + s2y /ny ) if σ1 6= σ2
(1− α) · 100%C.I. = (X − Y )± t∗ · SE(X − Y )
Ernst Wit / Wim Krijnen Week 2
Example 7.9: Weight loss under placebo vs ephedra
> t.test(x,y,var.equal=TRUE,conf.level=0.95)
t = -1.0542, df = 22, p-value = 0.3032 # 13+11-2=22
95 percent confidence interval:
-8.279119 2.698699
> t.test(x,y,var.equal=FALSE,conf.level=0.95)
t = -1.0722, df = 21.99, p-value = 0.2953 #21.99 = 22
95 percent confidence interval:
-8.187298 2.606878 #ouput adapted to fit screen
Conclusion: 0 in CI, no difference in average weight loss between 2groups belong to 95% of the expected values.
Ernst Wit / Wim Krijnen Week 2
7.5.3 Matched samples
Data: Thickness of shoe sole materials A and B on one foot foreach of ten boys.
Aim: a difference in average wear between materials A and B?
> t.test(shoes$A,shoes$B,equal.var=T)
...
95 percent CI:
-2.745046 1.925046
sample estimates:
mean of x mean of y
10.63 11.04
●
●
●
●
●
●
●
●
●
●
8 10 12 14
810
1214
shoes$A
shoe
s$B
Ernst Wit / Wim Krijnen Week 2
Matched pairs should be treated specially
> with(shoes, t.test(A,B,paired=TRUE,conf.level=.95))
Paired t-test
data: A and B
t = -3.3489, df = 9, p-value = 0.008539
95 percent confidence interval:
-0.6869539 -0.1330461
I Conclusion: 0 not in CI; nonzero difference in means with95% confidence.
I Note: Different result from unpaired confidence intervals!
I Paired CI eliminates variability among boys.
Ernst Wit / Wim Krijnen Week 2
Conclusions
I Estimation is one of the foremost statistical inferencetechniques.
I Estimates are variable: if new data would be collected, theestimate would be different.
I Confidence intervals capture the amount of variability of theestimate w.r.t. the true underlying parameter.
I Correct interpretation of a (1-α)100% CI for parameter:
In (1-α)100% of the cases in which a CI would beconstructed in the above way, the true parametervalue would be contained in it.
I Short-hand:
We are (1-α)100% confident that the trueparameter value lies in the CI.
Ernst Wit / Wim Krijnen Week 2
Hypothesis testing
Ernst Wit / Wim Krijnen Week 2
A Murder Mystery
DATA:
I 6:15 Ernst W. receives 20 min phone call
I 7:06 Ernst W. looks at his watch
I 7:08 Ernst W. bumps into his neighbour
I 7:13 Ernst W. arrives at the party
I 7:29 Ernst W. is discovered with the dead body of the host.
I Pathologist claims that host died between 7:00 and 7:05.
Police arrests W. on suspicion of murder and charge him with:
H0 : Ernst W. is guilty of the murder.
This is their working hypothesis or the null hypothesis
W. want to convince the judge of the alternative hypothesis:
H1 : Ernst W. is not guilty of the murder.
Ernst Wit / Wim Krijnen Week 2
All’s well that ends well
W’s lawyer suggests following summary of data (test statistic):
T = the neighbour saw Ernst W. at home at 7:05.
Ernst W.’s lawyer now argues that:
I IF E.W. was guilty of the murder that took place between7:00 and 7:05pm,
I THEN it would be impossible that the neighbour E.W. athome at 7:05pm.
I BUT the neighbour saw E.W. at home at 7:05.
I SO E.W. cannot be the murderer.
In other words, the “possibility value”, the p-value:
Pr(T happens if H0is true) = very small.
Therefore the judge rejects the null-hypothesis.Ernst Wit / Wim Krijnen Week 2
What if W. had hired a cheaper lawyer?
Another test-statistic, for instance,
S = W. made phone call at 6:15pm that lasted 20 minutes.
Although true, it does not disprove the null-hypothesis:
I IF E.W. is the murderer,
I THEN he may have made phone call at 6:15 lasting 20minutes.
So the p-value,
Pr(Shappens ifH0true) 6= small.
We cannot reject the null-hypothesis!
[Note: Large p-values do not prove the null-hypothesis.]
Ernst Wit / Wim Krijnen Week 2
Hypothesis Testing: General Theory
1. Key question: Is there evidence in favour of new theory?
2. Null-hypothesis: H0: THE NEW THEORY IS FALSE
3. Alternative hypothesis: H1: THE NEW THEORY IS TRUE
4. Test-statistic: T = summary of evidence in data.
5. Significance level: typically α = 0.05.
6. P-value: P(such an extreme T if H0 is true).
7. Decision:I If p-value < significance level, reject H0.I If p-value > significance level, do not reject H0.
Ernst Wit / Wim Krijnen Week 2
Statistical hypotheses
The Null-Hypothesis is always of the form:
H0 : parameter = some value.
Alternative Hypothesis is often negation of Null-Hypothesis:
H1 : parameter 6= some value.
This is called a 2-sided test.
Although 1-sided tests exist, i.e.,
H1 : parameter > value or parameter < value.
You are not allowed to use them after seeing the data.
NOTE: Statistical hypotheses are statements about population,not about the sample.
Ernst Wit / Wim Krijnen Week 2
The test-statistic, p-value and significance
Test statistic: summary of data informative about the parameter.
There is a main distinction:
I Normal Tests
I Non-Parametric Tests
Significance level: allowable mistake of rejecting H0 when it’sactually true.
P-value: probability of that same mistake in this particular case.
Asymmetric decision:
I IF p-value < significance level, thensufficient evidence to reject H0 and believe that H1 is true.
I IF p-value > significance level, theninsufficient evidence to reject H0. Proven nothing either way.
Ernst Wit / Wim Krijnen Week 2
8.1 Significance test for a population proportion
Example 8.3: Known poverty rate in 2000 is 11.3%.Sample of 50,000 in 2001; 5,850 (11.7%) indicate poverty.
Question: Did rate of poverty increase?
Hypotheses: Test no change of poverty against change of poverty
H0 : p = 0.113
H1 : p 6= 0.113
Test-statistic:
T = fraction of poor in sample in 2001.
Significance level:α = 0.05.
P-value: (Think hard about this!)
p-value = P(|T − 0.113| > |0.117− 0.113|)
Ernst Wit / Wim Krijnen Week 2
Calculating the p-value
Besides being conceptually challenging, we also need to remembera few things from last lecture in order to calculate the p-value.
Remember: For large n the sample proportion p satisfies
p − p0
SE(p|H0)=
p − p0√p0(1− p0)/n
∼ N(0, 1).
So
p-value = P(|T − 0.113| > |0.117− 0.113|)= P(| p−0.113√
0.113(1−0.113)/50000| > | 0.117−0.113√
0.113(1−0.113)/50000|)
= P(|Z | > 2.825)
= 2P(Z > 2.825)
= 0.0047
Ernst Wit / Wim Krijnen Week 2
Decision
By the way, we can also get the p-value from R:
> prop.test(x=5850,n=50000,p=.113)
1-sample proportions test with continuity correction
data: 5850 out of 50000, null probability 0.113
X-squared = 7.9417, df = 1, p-value = 0.004831
Decision: p-value < 0.05, so we reject H0.
Conclusion: We have significant evidence that poverty increasedin the population from 2000 to 2001.
PS. in the book a one-sided test is performed. This is not recommended!
Ernst Wit / Wim Krijnen Week 2
8.5 Two-sample tests of proportion
Example 8.8: Poverty rate (continued). What happened withthe fraction of poor from 2001 to 2002?
Sample:2001: 5850 out of 50000 indicate poverty2002: 7260 out of 60000 indicate poverty
Hypotheses: H0 : p1 = p2 against H1 : p1 6= p2 or equivalently:
H0 : p2 − p1 = 0
H1 : p2 − p1 6= 0
Test-statistic:T = p2 − p1.
We can use again that approximately for large n1 and n2:
p2 − p1 − (p2 − p1)√p1(1−p1)
n1+ p2(1−p2)
n2
∼ N(0, 1)
Ernst Wit / Wim Krijnen Week 2
Did poverty change from 2001 to 2002?
We can calculate the p-value by first principles, or ...
> prop.test(x=c(5850,7260),n=c(50000,60000))
data: c(5850, 7260) out of c(50000, 60000)
X-squared = 4.1187, df = 1, p-value = 0.04241
alternative hypothesis: two.sided
95 percent confidence interval:
-0.0078584975 -0.0001415025
sample estimates:
prop 1 prop 2
0.117 0.121
Decision: p-value < 0.05, so reject H0.
Conclusion: Also from 2001 to 2002 the fraction of poorincreased in the population.
Ernst Wit / Wim Krijnen Week 2
One Sample T-Test
1. Key question of Interest:
Is population mean equal to some pre-specifiednumber, say µ0.
2. Hypotheses: Null-Hypothesis and Alternative Hypothesis:
H0 : population mean =µ0
H1 : population mean 6= µ0.
3. Data is a random sample from a population.
4. Test-Statistic: T = sample mean
5. P-value: P(|T − µ0| > |x − µ0|), where x is the observedsample mean.
Ernst Wit / Wim Krijnen Week 2
IQ OF DRUG OFFENDERS
Question: Is there evidence that average IQ level of soft drugoffender population is different from 100?
Data: IQ scores on a sample of 15 soft drug offenders.
IQ of 15 soft drugs offenders
x
Fre
quen
cy
80 90 100 110 120 130 140 150
01
23
4
Hypotheses:
H0 : pop. mean IQ soft drug offenders = 100
H1 : pop. mean IQ soft drug offenders 6= 100
Ernst Wit / Wim Krijnen Week 2
Test-statistic, p-value en conclusion
Test-statistic: T = average IQ of 15 drug offenders.P-value: We observe t = 114.6 and SD = 16.6:
p-value = P(|T − 100| > |114.6− 100|)
= P(|T − 100|16.6/
√15
>|114.6− 100|
16.6/√
15)
= P(|t14| > 3.41)
= 0.0042
... or with R:
> t.test(x,mu=100)
t = 3.41, df = 14, p-value = 0.004228
alternative hypothesis: true mean is not equal to 100
Decision: Since p-value = 0.0042 < 0.05, we reject thenull-hypothesis H0.
Conclusion: Soft drug users are on average more intelligent thangeneral population.
Note: (a) conclusion in terms of population (b) one-sidedconclusion.
Ernst Wit / Wim Krijnen Week 2
Matched samples: paired t-test
Data: Thickness of shoe sole materials A and B on one foot foreach of ten boys.
Aim: a difference in average wear between materials A and B?
> t.test(shoes$A,shoes$B)
data: shoes$A and shoes$B
t = -0.3689, df = 17.987,
p-value = 0.7165
●
●
●
●
●
●
●
●
●
●
8 10 12 14
810
1214
shoes$A
shoe
s$B
Ernst Wit / Wim Krijnen Week 2
Matched pairs should be treated specially
> with(shoes, t.test(A,B,paired=TRUE,conf.level=.95))
Paired t-test
data: A and B
t = -3.3489, df = 9, p-value = 0.008539
95 percent confidence interval:
-0.6869539 -0.1330461
I Decision: p-value < 0.05, so reject H0.
I Conclusion: there is a difference between average wear of thetwo types of materials.
I Paired t-test eliminates variability among boys.
Ernst Wit / Wim Krijnen Week 2
Non-parametric tests
Have another look at the picture:
●
●
●
●
●
●
●
●
●
●
8 10 12 14
810
1214
shoes$A
shoe
s$B
Fact: 8 out of 10 points lies above the line-of-equality...
Test-statistic: T = number of points above the line
Ernst Wit / Wim Krijnen Week 2
Sign test
Hypotheses:
H0 : Median wear A = Median wear B
H1 : Median wear A 6= Median wear B
Note:T |H0 ∼ Binomial(10, .5).
p-value = P(|T − 5| > |8− 5|)= P(T ≥ 8) + P(T ≤ 2)
= 0.1094
I Decision: p-value > 0.05, so do not reject H0.I Conclusion: there is no evidence in T that there is a
difference between average wear of the two types of materials.I Sign test is LESS powerful than the paired t-test,
BUT it makes fewer assumptions.Ernst Wit / Wim Krijnen Week 2
Conclusions
Hypothesis testing checks to what extent the data can beexplained by a status quo assumption, or whether it needs to berejected because the data are too unusual otherwise.
I Terminology: hypotheses, test-statistic, p-value, significancelevel.
I Interpretation: be careful when interpreting p-values!
I Tests: most common tests are based on normal distribution.Non-parametric tests avoid making normal assumptions, butare less powerful if these assumptions are actually true.
Ernst Wit / Wim Krijnen Week 2