Confidence Intervals with Means

56
Confidence Intervals with Means Chapter 9

description

Confidence Intervals with Means. Chapter 9. Formula:. Standard deviation of statistic. Critical value. statistic. Margin of error. Student’s t- distribution. Developed by William Gosset Continuous distribution Unimodal, symmetrical, bell-shaped density curve Above the horizontal axis - PowerPoint PPT Presentation

Transcript of Confidence Intervals with Means

Page 1: Confidence Intervals with Means

Confidence Intervals with

MeansChapter 9

Page 2: Confidence Intervals with Means

Formula:Formula:

nzx

* :Interval Confidence

statistic

Critical value

Standard deviation of statistic

Margin of errorMargin of error

Page 3: Confidence Intervals with Means

Student’s t- distributionStudent’s t- distribution

• Developed by William Gosset

• Continuous distribution

• Unimodal, symmetrical, bell-shaped density curve

• Above the horizontal axis

• Area under the curve equals 1

• Based on degrees of freedomdf = n - 1df = n - 1

Page 4: Confidence Intervals with Means

How does the How does the tt-distributions -distributions compare to the standard compare to the standard normal distribution?normal distribution?

• Shorter & more spread out

• More area under the tails

• As n increases, t-distributions become more like a standard normal distribution

Page 5: Confidence Intervals with Means

Formula:Formula:

n

stx * :Interval Confidence

statistic

Critical value

Standard deviation of statistic

Margin of errorMargin of error

Standard error – when you

substitute s for .

Page 6: Confidence Intervals with Means

How to find How to find tt**

Can also use invT on the calculator!

Need upper t* value with 5% is above – so 95% is below

invT(p,df)

Find these t*90% confidence when n = 595% confidence when n = 15

t* =2.132

t* =2.145

Page 7: Confidence Intervals with Means

Steps for doing a confidence Steps for doing a confidence interval:interval:1) Assumptions –

2) Calculate the interval

3) Write a statement about the interval in the context of the problem.

We are ________% confident that the true mean context is between ______ and ______.

Page 8: Confidence Intervals with Means

Assumptions for Assumptions for tt-inference-inference

• Have an SRS from population (or randomly assigned treatments)

• unknown

• Normal (or approx. normal) distribution– Given– Large sample size– Check graph of data

Use only one of these methods to check normality

Page 9: Confidence Intervals with Means

Ex. 2) A medical researcher measured the pulse rate of a random sample of 20 adults and found a mean pulse rate of 72.69 beats per minute with a standard deviation of 3.86 beats per minute. Assume pulse rate is normally distributed. Compute a 95% confidence interval for the true mean pulse rates of adults.

We are 95% confident that the true mean pulse rate of adults is between 70.883 & 74.497.

Page 10: Confidence Intervals with Means

Ex. 3) Consumer Reports tested 14 randomly selected brands of vanilla yogurt and found the following numbers of calories per serving:

160 200 220 230 120 180 140

130 170 190 80 120 100 170

Compute a 98% confidence interval for the average calorie content per serving of vanilla yogurt.

We are 98% confident that the true mean calorie content per serving of vanilla yogurt is between 126.16 calories & 189.56 calories.

Page 11: Confidence Intervals with Means

Ex 3 continued) A diet guide claims that you will get 120 calories from a serving of vanilla yogurt. What does this evidence indicate?

Since 120 calories is not contained within the 98% confidence interval, the evidence suggest that the average calories per serving does not equal 120 calories.

Note: confidence intervals tell us if something is NOT EQUALNOT EQUAL

– never less or greater than!

Page 12: Confidence Intervals with Means

RobustRobust

• An inference procedure is ROBUST if the confidence level or p-value doesn’t change much if the normality assumption is violated.

• t-procedures can be used with some skewness, as long as there are no outliers.

• Larger n can have more skewness.

Since there is more area in the tails in t-distributions, then, if a distribution has

some skewness, the tail area is not greatly affected.

CI & p-values deal with area in the tails – is the area changed greatly

when there is skewness

Page 13: Confidence Intervals with Means

Find a sample size:Find a sample size:

n

zm

*

• If a certain margin of error is wanted, then to find the sample size necessary for that margin of error use:

Always round up to the nearest person!

Page 14: Confidence Intervals with Means

Ex 4) The heights of SHS male students is normally distributed with = 2.5 inches. How large a sample is necessary to be accurate within + .75 inches with a 95% confidence interval?

n = 43

Page 15: Confidence Intervals with Means

Some Cautions:Some Cautions:

• The data MUST be a SRS from the population (or randomly assigned treatment)

• The formula is not correct for more complex sampling designs, i.e., stratified, etc.

• No way to correct for bias in data

Page 16: Confidence Intervals with Means

Cautions continued:Cautions continued:

• Outliers can have a large effect on confidence interval

• Must know to do a z-interval – which is unrealistic in practice

Page 17: Confidence Intervals with Means

Hypothesis TestsHypothesis Tests

One Sample Means

Page 18: Confidence Intervals with Means

Steps for doing a hypothesis test

1) Assumptions

2) Write hypotheses & define parameter

3) Calculate the test statistic & p-value

4) Write a statement in the context of the problem.

H0: = 12 vs Ha: (<, >, or ≠) 12

“Since the p-value < (>) , I reject (fail to reject) the H0. There is (is not) sufficient evidence to suggest that Ha (in context).”

Page 19: Confidence Intervals with Means

Assumptions for t-inference

• Have an SRS from population (or randomly assigned treatments)

• unknown• Normal (or approx. normal)

distribution– Given– Large sample size– Check graph of data

Use only one of these methods to check normality

Page 20: Confidence Intervals with Means

Formulas:

unknown:

statistic of deviation standard

parameter - statisticstatistic test

t =

x

ns

Page 21: Confidence Intervals with Means

Calculating p-values• For z-test statistic –

– Use normalcdf(lb,ub) – [using standard normal curve]

• For t-test statistic –– Use tcdf(lb, ub, df)

Page 22: Confidence Intervals with Means

Example 1: Bottles of a popular cola are supposed to contain 300 mL of cola. There is some variation from bottle to bottle. An inspector, who suspects that the bottler is under-filling, measures the contents of six randomly selected bottles. Is there sufficient evidence that the bottler is under-filling the bottles? Use = .1

299.4 297.7 298.9 300.2 297 301

Page 23: Confidence Intervals with Means

• I have an SRS of bottles•Since the boxplot is approximately symmetrical with no outliers, the sampling distribution is approximately normally distributed• is unknown

SRS?

576.1

6503.1

30003.299

t p-value =.0880 = .1

Normal?How do

you know?

H0: = 300 where is the true mean amount

Ha: < 300 of cola in bottles

What are your hypothesis

statements? Is there a key word?

Plug values into formula.

Do you know

?

Since p-value < , I reject the null hypothesis.There is sufficient evidence to suggest that the true mean cola in the bottles is less than 300 mL.

Compare your p-value to & make

decisionWrite conclusion in

context in terms of Ha.

Page 24: Confidence Intervals with Means

Example 3: The Wall Street Journal (January 27, 1994) reported that based on sales in a chain of Midwestern grocery stores, President’s Choice Chocolate Chip Cookies were selling at a mean rate of $1323 per week. Suppose a random sample of 30 weeks in 1995 in the same stores showed that the cookies were selling at the average rate of $1208 with standard deviation of $275. Does this indicate that the sales of the cookies is lower than the earlier figure?

Page 25: Confidence Intervals with Means

Assume:

•Have an SRS of weeks

•Distribution of sales is approximately normal due to large sample size

• unknown

H0: = 1323 where is the true mean cookie sales

Ha: < 1323 per week

Since p-value < of 0.05, I reject the null hypothesis. There is sufficient evidence to suggest that the sales of cookies are lower than the earlier figure.

0147.29.2

30275

13231208

valuept

What is the potential error in context?

What is a consequence of that error?

Page 26: Confidence Intervals with Means

Example 9: President’s Choice Chocolate Chip Cookies were selling at a mean rate of $1323 per week. Suppose a random sample of 30 weeks in 1995 in the same stores showed that the cookies were selling at the average rate of $1208 with standard deviation of $275. Compute a 90% confidence interval for the mean weekly sales rate.CI = ($1122.70, $1293.30)Based on this interval, is the mean weekly sales rate statistically less than the reported $1323?

Page 27: Confidence Intervals with Means

Matched Pairs Test

A special type of t-inference

Page 28: Confidence Intervals with Means

Matched Pairs – two forms

• Pair individuals by certain characteristics

• Randomly select treatment for individual A

• Individual B is assigned to other treatment

• Assignment of B is dependent on assignment of A

• Individual persons or items receive both treatments

• Order of treatments are randomly assigned or before & after measurements are taken

• The two measures are dependent on the individual

Page 29: Confidence Intervals with Means

Is this an example of matched pairs?

1)A college wants to see if there’s a difference in time it took last year’s class to find a job after graduation and the time it took the class from five years ago to find work after graduation. Researchers take a random sample from both classes and measure the number of days between graduation and first day of employmentNo, there is no pairing of individuals, you have two independent samples

Page 30: Confidence Intervals with Means

Is this an example of matched pairs?

2) In a taste test, a researcher asks people in a random sample to taste a certain brand of spring water and rate it. Another random sample of people is asked to taste a different brand of water and rate it. The researcher wants to compare these samples

No, there is no pairing of individuals, you have two independent samples – If you would have the same people taste both brands in random order, then it would be an example of matched pairs.

Page 31: Confidence Intervals with Means

Is this an example of matched pairs?

3) A pharmaceutical company wants to test its new weight-loss drug. Before giving the drug to a random sample, company researchers take a weight measurement on each person. After a month of using the drug, each person’s weight is measured again.

Yes, you have two measurements that are dependent on each individual.

Page 32: Confidence Intervals with Means

A whale-watching company noticed that many customers wanted to know whether it was better to book an excursion in the morning or the afternoon. To test this question, the company collected the following data on 15 randomly selected days over the past month. (Note: days were not consecutive.)

Day 1 2 3 4 5 6 7 8 9 1011

12

13

14

15

Morning 8 9 7 9

10

13

10 8 2 5 7 7 6 8 7

After-noon 8 10 9 8 9

11

8 10 4 7 8 9 6 6 9First, you must find the differences for

each day.

Since you have two values for each day, they are

dependent on the day – making this data matched

pairs

You may subtract either way – just be careful

when writing Ha

Page 33: Confidence Intervals with Means

Day1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

Morning 8 9 7 9 10 13 10 8 2 5 7 7 6 8 7After-noon 8 10 9 8 9 11 8 10 4 7 8 9 6 6 9

Differences 0 -1 -2 1 1 2 2 -2 -2 -2

-1

-20 2

-2

Assumptions:

• Have an SRS of days for whale-watching

• unknown

•Since the normal probability plot is approximately linear, the distribution of difference is approximately normal.

I subtracted:Morning – afternoon

You could subtract the other way!

You need to state assumptions using the differences!

Notice the granularity in this plot, it is still displays a nice linear relationship!

Page 34: Confidence Intervals with Means

Differences 0 -1 -2 1 1 2 2 -2 -2 -2 -1 -2 0 2 -2

Is there sufficient evidence that more whales are sighted in the afternoon?

Be careful writing your Ha!

Think about how you subtracted: M-A

If afternoon is more should the differences be

+ or -?Don’t look at numbers!!!!

H0: D = 0

Ha: D < 0

Where D is the true mean difference in whale sightings from morning minus afternoon

Notice we used D for differences

& it equals 0 since the null should be that there is NO

difference.

If you subtract afternoon –

morning; then Ha: D>0

Page 35: Confidence Intervals with Means

finishing the hypothesis test:

Since p-value > , I fail to reject H0. There is insufficient evidence to suggest that more whales are sighted in the afternoon than in the morning.

05.14

1803.

945.

15639.1

04.

df

p

nsx

t Notice that if you subtracted A-M, then your test statistic

t = + .945, but p-value would be the same

In your calculator, perform a t-test

using the differences (L3)

Differences 0 -1 -2 1 1 2 2 -2 -2 -2 -1 -2 0 2 -2

How could I increase the power of this

test?

Page 36: Confidence Intervals with Means

Two-Sample Inference

Procedures with Means

Page 37: Confidence Intervals with Means

Remember:

yxyx

22

yxyx

We will be

interested in

the difference of means, so we

will use this to

find standard error.

Page 38: Confidence Intervals with Means

Suppose we have a population of adult men with a mean height of 71 inches and standard deviation of 2.6 inches. We also have a population of adult women with a mean height of 65 inches and standard deviation of 2.3 inches. Assume heights are normally distributed.

Describe the distribution of the difference in heights between males and females (male-female).Normal distribution withx-y =6 inches & x-y =3.471 inches

Page 39: Confidence Intervals with Means

7165

FemaleMale

6

Difference = male - female

= 3.471

Page 40: Confidence Intervals with Means

a) What is the probability that the height of a randomly selected man is at most 5 inches taller than the height of a randomly selected woman?

b) What is the 70th percentile for the difference (male-female) in heights of a randomly selected man & woman?

P((xM-xF) < 5) = normalcdf(-∞,5,6,3.471) = .3866

(xM-xF) = invNorm(.7,6,3.471) = 7.82

Page 41: Confidence Intervals with Means

Two-Sample Procedures with

means• The goal of these inference procedures is to compare the responses to two treatmentstwo treatments or to compare the characteristics of two populationstwo populations.

• We have INDEPENDENT samples from each treatment or population

When we compare, what are

we interested

in?

Page 42: Confidence Intervals with Means

Assumptions:• Have two SRS’stwo SRS’s from the

populations or two randomly two randomly assignedassigned treatment groups

• Samples are independent• Both distributions are

approximately normally– Have large sample sizes– Graph BOTH sets of data

• ’’ss unknown

Page 43: Confidence Intervals with Means

Formulas

Since in real-life, we will NOTNOT know both

’s, we will do t-procedures.

Page 44: Confidence Intervals with Means

Degrees of FreedomOption 1: use the smaller of the

two values n1 – 1 and n2 – 1

This will produce conservative This will produce conservative results – higher p-values & results – higher p-values & lower confidence.lower confidence.

Option 2: approximation used by technology

2

2

2

21

2

1

1

2

2

2

2

1

2

1

11

11

ns

nns

n

ns

ns

df

Calculator does this

automatically!

Page 45: Confidence Intervals with Means

Confidence intervals:

statistic of SD valuecritical statisticCI

21xx *t

2

2

2

1

2

1

n

s

n

s

Called standard

error

Page 46: Confidence Intervals with Means

Pooled procedures:

• Used for two populations with the samesame variance

• When you pool, you average the two-sample variances to estimate the common population variance.

• DO NOT use on AP Exam!!!!!We do NOT know the variances of the population,

so ALWAYS tell the calculator NO for pooling!

Page 47: Confidence Intervals with Means

Two competing headache remedies claim to give fast-acting relief. An experiment was performed to compare the mean lengths of time required for bodily absorption of brand A and brand B. Assume the absorption time is normally distributed. Twelve people were randomly selected and given an oral dosage of brand A. Another 12 were randomly selected and given an equal dosage of brand B. The length of time in minutes for the drugs to reach a specified level in the blood was recorded. The results follow: mean SD n Brand A

20.1 8.7 12 Brand B18.9 7.5 12

Describe the shape & standard error for sampling distribution of the differences in the mean speed of absorption. (answer on next screen)

Page 48: Confidence Intervals with Means

Describe the sampling distribution of the differences in the mean speed of absorption.

Find a 95% confidence interval difference in mean lengths of time required for bodily absorption of each brand. (answer on next screen)

Normal distribution with S.E. = 3.316Normal distribution with S.E. = 3.316

Page 49: Confidence Intervals with Means

Assumptions:

Have 2 independent randomly assigned treatments Given the absorption rate is normally distributed ’s unknown

)085.8,685.5(12

5.7

12

7.8080.29.181.20

22

53.21*2

22

1

21

21 dfn

s

n

stxx

We are 95% confident that the true difference in mean lengths of time required for bodily absorption of each brand is between –5.685 minutes and 8.085 minutes.

State assumptions!

Formula & calculations

Conclusion in contextFrom calculator df = 21.53, use t* for df = 21 & 95% confidence

level

Think “Price is Right”!

Closest without going over

Page 50: Confidence Intervals with Means

Note: confidence interval statements

•Matched pairs – refer to “mean difference”“mean difference”

•Two-Sample – refer to “difference of means”“difference of means”

Page 51: Confidence Intervals with Means

Hypothesis Statements:

H0: 1 - 2 = 0

Ha: 1 - 2 < 0

Ha: 1 - 2 > 0

Ha: 1 - 2 ≠ 0

H0: 1 = 2

Ha: 1< 2

Ha: 1> 2

Ha: 1 ≠ 2

Be sure to define BOTHBOTH 1 and 2!

Page 52: Confidence Intervals with Means

Hypothesis Test:

statistic of SD

parameter - statisticstatisticTest

t

2

2

2

1

2

1

2121

ns

nsxx

Since we usually assume H0 is true,

then this equals 0 – so we can usually

leave it out

Page 53: Confidence Intervals with Means

The length of time in minutes for the drugs to reach a specified level in the blood was recorded. The results follow:

mean SD n Brand A 20.1 8.7 12 Brand B 18.9 7.5 12

Is there sufficient evidence that these drugs differ in the speed at which they enter the blood stream?

Page 54: Confidence Intervals with Means

Have 2 independent randomly assigned treatments Given the absorption rate is normally distributed ’s unknown

05.53.217210.

361.

125.7

127.8

9.181.2022

2

22

1

21

21

αdfvaluep

ns

ns

xxt

Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that these drugs differ in the speed at which they enter the blood stream.

State assumptions!

Formula & calculations

Conclusion in context

H0: A= B

Ha:A= B

Where A is the true mean absorption time for Brand A & B is the true mean absorption time for Brand B

Hypotheses & define variables!

Page 55: Confidence Intervals with Means

Suppose that the sample mean of Brand B is 16.5, then is Brand B faster?

05.53.212896.

085.1

125.7

127.8

5.161.2022

2

22

1

21

21

αdfvaluep

ns

ns

xxt

No, I would still fail to reject the null hypothesis.

Page 56: Confidence Intervals with Means

Robustness:

• Two-sample procedures are more more robustrobust than one-sample procedures

• BESTBEST to have equal sample sizes! (but not necessary)