Confidence Intervals for two Proportions and One …cathy/Math2311/Lectures/Ch… · ·...
Transcript of Confidence Intervals for two Proportions and One …cathy/Math2311/Lectures/Ch… · ·...
Confidence Intervals for two Proportions and OneSample MeanSections 7.3 & 7.4
Cathy Poliak, [email protected]
Office hours: T Th 2:30 pm - 5:15 pm 620 PGH
Department of MathematicsUniversity of Houston
April 5, 2016
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 1 / 33
Outline
1 Beginning Questions
2 Comparing Two Proportions
3 Inference for Means
4 The T-distribution
5 Confidence Interval for Population Mean
6 Sample Size
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 2 / 33
Popper Set Up
Fill in all of the proper bubbles.
Use a #2 pencil.
This is popper number 17.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 3 / 33
Items Used in Confidence Intervals
Point estimate
Confidence level
Critical value
Standard error
Margin of error = critical value × standard error
Interpretation: The confidence interval is
point estimate±margin of error
We are C% confident that the population parameter is betweenpoint estimate−margin of error andpoint estimate + margin of error.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 4 / 33
Popper #17 Questions
For the Wisconsin State GOP primary election a poll was conducted bythe Marquette Law School (3/24 - 3/28). From a sample of 768 likelyvoters for the republican primary, 40% said they would vote for Cruz,while 30% said they would vote for Trump, with 5.8% margin of error.
1. Which statement is correct about the results of this poll?a Of all likely republican primary voters, Cruz will win over Trump.
b There is no statistical difference between who will vote for Cruz andwho will vote for Trump.
c 40% of all likely republican primary voters will vote for Cruz.
d 30% of all likely republican primary voters will not vote at all.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 5 / 33
Comparing Two Proportions
What is the difference between the proportion of m&ms that are blue inthe plain m&ms compared to the peanut m&ms?
From a random sample of plain m&ms and peanut m&ms we getthe following results.
Candy type n Number of Blue Sample proportion (p̂)plain 81 28 p̂plain = 28
81 = 0.3458peanut 100 20 p̂peanut = 20
100 = 0.2We want to know what is the difference of the proportion of m&msthat are blue for all of plain and peanut m&ms. That is, estimate:
ppeanut − pplain
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 6 / 33
Two-sample problems assumptions
The goal of inference is to compare the responses in two groups.1. Each group is considered to be a simple random sample from
two distinct populations.2. The population sizes are both at least ten times the sizes of the
samples.3. The number of successes and failures in both samples must all
be ≥ 10.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 7 / 33
Confidence intervals for comparing two proportions
Choose an SRS of n1 from a large population having proportion p1 ofsuccesses and and independent SRS of size n2 from anotherpopulation having proportion p2 of successes.
1. Point estimate: D = p̂1 − p̂2 = X1n1− X2
n2
2. Confidence level: C a percent predetermined in the problem if notuse 95%.
3. Critical value: z∗ is the value for the standard Normal densitycurve with area C between −z∗ and z∗.
4. Confidence interval:
(p̂1 − p̂2)± z∗√
p̂1(1− p̂1)
n1+
p̂2(1− p̂2)
n2
5. Interpret
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 8 / 33
Determine a 95% confidence interval for the differenceof the proportion of m&ms that are blue for all of plainand peanut m&ms.
From a random sample of plain m&ms and peanut m&ms we get thefollowing results.
Candy type n Number of Blue Sample proportion (p̂)plain 81 28 p̂plain = 28
81 = 0.3458peanut 100 20 p̂peanut = 20
100 = 0.2
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 9 / 33
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 10 / 33
R code
prop.test(x=c(x1,x2),n=c(n1,n2),conf.level = C, correct = FALSE)
prop.test(x=c(28,20),n=c(81,100),conf.level = 0.95,correct=FALSE)
2-sample test for equality of proportions without continuitycorrection
data: c(28, 20) out of c(81, 100)X-squared = 4.8738, df = 1, p-value = 0.02727alternative hypothesis: two.sided95 percent confidence interval:0.01578192 0.27557610
sample estimates:prop 1 prop 2
0.345679 0.200000
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 11 / 33
TI-83(84)
STAT→ TESTS→ B:2-PropZint
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 12 / 33
Popper #17 Questions
This is from The Practice of Statistics for Business and Economics,3ed. by Moore, et al., p 483. A Pew Internet Project Data Memopresented data comparing adult gamers with teen gamers with respectto the devices on which they play. The data are from two surveys. Theadult survey had 1063 gamers, and the teen survey had 1064 gamers.The memo reports that 574 of adult gamers played on consoles (Xbox,PlayStation, Wii, etc.), and 947 of teen gamers played on gameconsoles.
2. Find the estimate of the difference between the proportion of teengamers who played on game consoles and the proportion ofadults who played on these devices. That is find p̂teen − p̂adult .
a) 373 b) 1 c) 0.35 d) 03. Find the 95% confidence interval for the difference of the
proportions.a) (313,387) c) (0.54,0.89)b) (0.315,0.385) d) (574,947)
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 13 / 33
Popper #17 Questions
A coffee machine dispenses coffee into paper cups. Here are theamounts measured in a random sample of 20 cups.
9.9, 9.7, 10.0, 10.1, 9.9, 9.6, 9.8, 9.8, 10.0, 9.5,9.7, 10.1, 9.9, 9.6, 10.2, 9.8, 10.0, 9.9, 9.5, 9.9
4. Determine the mean amount from these 20 cups.a) 10 b) 9.845 c) 9 d) 0
5. Determine the standard deviation of the amount from these 20cups.
a) 9.845 b) 0.1986 c) 3.137 d) 06. Are the mean and standard deviation you calculated parameters
or statistics?a) parameter b) statistic
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 14 / 33
Assumptions for Estimating the Population Mean
1. The sample has to be as a result of a simple random sample(SRS).
2. The distribution of the population has to be Normal. By the CentralLimit Theorem if our sample size is larger than 30 then the samplemeans have a Normal distribution.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 15 / 33
Point estimate for µ
When estimating the population mean µ the point estimate is thesample mean
x̄ =
∑ni xi
n.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 16 / 33
Standard Error
When the standard deviation of a statistic is estimated from thedata, the result is called the standard error of the statistic.
The standard error of the sample mean is
SEX̄ =s√n
where s is the computed sample standard deviation from the data.
From our example: SEX̄ = 0.1986√20
= 0.0444.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 17 / 33
The T-distribution
The problem is that the sample standard deviation s varies fromsample to sample.William Gosset, (a quality control engineer for the GuinnessBrewery) discovered this problem and figured out a newdistribution that changes the critical value based on the samplesize.This new distribution is called Students T distribution, becauseGuinness would not allow Gosset to publish his findings since hewas their employee.The shape of this distribution changes with different sample sizes.So it depends on a parameter called the degrees of freedom (df )The degrees of freedom for the T-distribution of the sample meanis the sample size minus one (n − 1). Because we are using thesample standard deviation s =
√1
n−1∑n
i=1(xi − x̄)2.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 18 / 33
The T-distribution
The problem is that the sample standard deviation s varies fromsample to sample.William Gosset, (a quality control engineer for the GuinnessBrewery) discovered this problem and figured out a newdistribution that changes the critical value based on the samplesize.This new distribution is called Students T distribution, becauseGuinness would not allow Gosset to publish his findings since hewas their employee.The shape of this distribution changes with different sample sizes.So it depends on a parameter called the degrees of freedom (df )The degrees of freedom for the T-distribution of the sample meanis the sample size minus one (n − 1). Because we are using thesample standard deviation s =
√1
n−1∑n
i=1(xi − x̄)2.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 18 / 33
The T-distribution
The problem is that the sample standard deviation s varies fromsample to sample.William Gosset, (a quality control engineer for the GuinnessBrewery) discovered this problem and figured out a newdistribution that changes the critical value based on the samplesize.This new distribution is called Students T distribution, becauseGuinness would not allow Gosset to publish his findings since hewas their employee.The shape of this distribution changes with different sample sizes.So it depends on a parameter called the degrees of freedom (df )The degrees of freedom for the T-distribution of the sample meanis the sample size minus one (n − 1). Because we are using thesample standard deviation s =
√1
n−1∑n
i=1(xi − x̄)2.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 18 / 33
The T-distribution
The problem is that the sample standard deviation s varies fromsample to sample.William Gosset, (a quality control engineer for the GuinnessBrewery) discovered this problem and figured out a newdistribution that changes the critical value based on the samplesize.This new distribution is called Students T distribution, becauseGuinness would not allow Gosset to publish his findings since hewas their employee.The shape of this distribution changes with different sample sizes.So it depends on a parameter called the degrees of freedom (df )The degrees of freedom for the T-distribution of the sample meanis the sample size minus one (n − 1). Because we are using thesample standard deviation s =
√1
n−1∑n
i=1(xi − x̄)2.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 18 / 33
The T-distribution
The problem is that the sample standard deviation s varies fromsample to sample.William Gosset, (a quality control engineer for the GuinnessBrewery) discovered this problem and figured out a newdistribution that changes the critical value based on the samplesize.This new distribution is called Students T distribution, becauseGuinness would not allow Gosset to publish his findings since hewas their employee.The shape of this distribution changes with different sample sizes.So it depends on a parameter called the degrees of freedom (df )The degrees of freedom for the T-distribution of the sample meanis the sample size minus one (n − 1). Because we are using thesample standard deviation s =
√1
n−1∑n
i=1(xi − x̄)2.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 18 / 33
T distribution
Used for the inference of the population mean. When populationstandard deviation σ is unknown.
The distribution of the population is basically bell-shape.
Formula for t :t =
x̄ − µs/√
n
Use t-table, or qt(probability,df) in R.
Degrees of freedom: df = n − 1.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 19 / 33
Normal Distribution vs T distribution
The red graph is the Normal density curve and the blue graph is the Tdensity curve with a degrees of freedom of 4.
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
density
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
density
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 20 / 33
Using T-table
The top margin is the area in the right tail.
The left margin is the degrees of freedom n − 1.
The values inside the table are the t values.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 21 / 33
Critical value σ unknown
When σ is unknown we use t-distribution.
With degrees of freedom, df = n − 1.
The critical value is t∗ where the area between −t∗ and +t∗ underthe T-curve is the confidence level C = 1− α.
t∗ is found in T-table using the row according to the degrees offreedom and the column according to the confidence level at thebottom of the table.
In R use qt((1 + C)/2, df).
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 22 / 33
Critical value for µ with σ known
If σ is known the critical value is z∗ where the area under theNormal curve is between −z∗ and +z∗ is the confidence levelC = 1− α. This critical value is found at the bottom of the T-table.
The following table is the common confidence levels with theirz-score
C 80% 90% 95% 99%z∗ 1.28 1.645 1.96 2.576
In R qnorm((1 + C)/2).
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 23 / 33
Margin of Error
The margin of error is
m = critical value× standard error
If σ is known then the margin of error for estimating the mean µ is
m = z∗ × σ√n
If σ is unknown then the margin of error for estimating the mean µis
m = t∗ × s√n
With df = n − 1.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 24 / 33
What is the mean population monthly cell phone bill?
A survey taken in 2010 polled 400 randomly chosen cell phoneusers. They answered the question: "What is your averagemonthly cell phone bill?"The following are the characteristics of the sample:
I The sample mean is x̄ = $71.I Assume the population standard deviation to be σ = $20.I The sample size is n = 400.
Determine a 96.5% confidence interval.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 25 / 33
Snicker bars
Suppose your class is investigating the weights of Snickers 1−ouncefun−size candy bars to see if the customers are getting full value fortheir money. Assume the weights are Normally distributed. Severalcandy bars are randomly selected and weighed with sensitivebalances borrowed from the physics lab. The weights are:
0.95 1.02 0.98 0.97 1.05 1.01 0.98 1.00
We want to determine a 90% confidence interval for the true meanweight of these candy bars.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 26 / 33
Using R if Given the Data
R code: t.test(name of x, conf.level = C)
> snickers<-c(0.95,1.02,0.98,0.97,1.05,1.01,0.98,1.00)> t.test(snickers,conf.level=0.9)
One Sample t-test
data: snickerst = 88.996, df = 7, p-value = 5.957e-12alternative hypothesis: true mean is not equal to 090 percent confidence interval:0.973818 1.016182
sample estimates:mean of x
0.995
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 27 / 33
TI-83(84)
1. STAT, Edit, enter the data into L1.2. STAT→ TESTS3. 7:ZInterval if we are given the population standard deviation σ,
otherwise use 8:TInterval.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 28 / 33
TI Screen Shots
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 29 / 33
Popper #17 Questions
A coffee machine dispenses coffee into paper cups. From a simplerandom sample of 20 cups we found the mean to be x̄ = 9.845 oz. andthe sample standard deviation to be s = 0.1986.
7. Determine a 99% confidence interval for the mean amount ofcoffee dispensed from this machine.a) (9.7306, 9.9594)b) (9.718,9.972)c) (9.762, 9.928)d) (9.277, 10.413)
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 30 / 33
Choosing Sample Size
You can have both a high confidence while at the same time asmall margin of error by taking enough observations.The confidence interval for a population mean will have aspecified margin of error m when the sample size is
n =
(z∗ × σ
m
)2
where the sample size is the next whole number.
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 31 / 33
Starting Salary
We want to estimate annual starting salaries for collegegraduates. To determine this we need a sample.Assume that a 95% confidence interval estimate of the populationmean annual starting salary is desired.Assume the standard deviation is σ = $7,500.How large a sample should be taken if the desired margin of erroris m = $500?
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 32 / 33
Popper #17 Questions
8. Given σ = 7500 and a 95% confidence level. What should be thesample size if we want the margin of error to be m = $100?a) 7125b) 147c) 21,609d) 21,610
Cathy Poliak, Ph.D. [email protected] Office hours: T Th 2:30 pm - 5:15 pm 620 PGH (Department of Mathematics University of Houston )Sections 7.3 & 7.4 April 5, 2016 33 / 33