t Distribution for Means

21
AP Statistics Page 1 of 21 Review: t Distribution for Means _____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) Key Terms and Concepts Before taking the quiz, you need to be able to explain the meanings (and recognize symbols in cases where there’s an associated symbol) of each of these terms or concepts. You should also know when and how to use them in statistics problems. These terms and concepts are defined in Key Terms. critical t values critical value table degrees of freedom degrees of freedom for independent samples independent samples matched pairs one-sample procedures pooled variance robust sample size for given margin of error standard error t distribution t distribution critical value table t intervals t significance test t significance tests on a graphing calculator tcdf two-sample procedures two-sample t statistic

description

Review of t Distribution for Means

Transcript of t Distribution for Means

Page 1: t Distribution for Means

AP Statistics Page 1 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Key Terms and Concepts Before taking the quiz, you need to be able to explain the meanings (and recognize symbols in cases where there's an associated symbol) of each of these terms or concepts. You should also know when and how to use them in statistics problems. These terms and concepts are defined in Key Terms. critical t values critical value table degrees of freedom degrees of freedom for independent samples independent samples matched pairs one-sample procedures pooled variance robust sample size for given margin of error standard error t distribution t distribution critical value table t intervals t significance test t significance tests on a graphing calculator tcdf two-sample procedures two-sample t statistic

Page 2: t Distribution for Means

AP Statistics Page 2 of 21 Review: t Distribution for Means

_____________ © Copyright 2000 Apex Learning Inc. All rights reserved. This material is intended for the exclusive use of

Objectives, Example Problems, and Study Tips Confidence Intervals and Hypothesis Testing for a Single Mean Objective 1 Define t distributions in plain words. Describe their shape and list at least two differences between the t distributions and the normal distribution. Example Define t distributions in plain words. Describe their shape and list at least two differences between the t distributions and the normal distribution. Tips • Researchers use t distributions when they don't know the population standard deviation

σ. In these cases, they estimate σ with the sample standard deviation s. The measure of spread in a t distribution for a single mean is called the standard error, and is written as

n

s.

• To find the degrees of freedom for a t distribution for a single mean, subtract one from the sample size (df = sample size � 1).

Objective 2 Specify when t a distribution should be used instead of a z-distribution when doing inference for a population mean. Example When should you use a t distribution instead of a z-distribution when you're doing inference for a population mean? Tips • Use a t distribution whenever you don't know the population standard distribution.

However, if your sample size is large (usually larger than 30), you can use a normal distribution as an approximation of a t distribution. This is because the Central Limit Theorem says that, for large sample sizes, a sampling distribution will approximate a

normal distribution with mean x and standard error n

s. (Just how large is large enough

depends on the shape of the parent population. If the parent population is extremely skewed or contains outliers, your samples will need to be larger if you want to be confident that the sampling distribution can be approximated by the normal curve.)

• Whenever you use a t distribution, you must show that the population is free from extreme skewness and outliers.

• Since some t tables don't provide critical t values for sample sizes above 31, you may

want to use a z-procedure for large samples to approximate what a t procedure would give you. However, if you don't know the population standard deviation, and if you have a calculator that can generate a t interval or calculate a t statistic for large samples, it's usually safest to stick with a t procedure. This is because a z-procedure will give only a close approximation of the true probability you'd get by using a t distribution.

Page 3: t Distribution for Means

AP Statistics Page 3 of 21 Review: t Distribution for Means

_____________ © Copyright 2000 Apex Learning Inc. All rights reserved. This material is intended for the exclusive use of

• If you do use a z-procedure in a case where the sample size is larger than 30, and where you don't know the population standard deviation, you must state that your sample size is large enough, per the Central Limit Theorem, to ensure your sampling distribution is a close approximation of a normal distribution. In other words, you must state that the large size of the sample means that s is a good estimator of σ.

Answer When doing inference for a population mean, use a t distribution if you don't know the population standard deviation, if it seems reasonable that the sample could have been drawn from a normal population, and if the sample is free from extreme skewness and outliers. However, if the sample size is greater than 30 and you don't have a table or calculator that will give you the critical t values, you can use a z-procedure as a close approximation of what you'd get if you used a t distribution. This is because the Central Limit Theorem says that when the sample size is large enough (greater than 30, according to most standards), the sampling distribution will approximate a normal distribution with mean x and standard

error n

s.

If the sample size is greater than 30 and you're using a calculator, you should use a t procedure because you'll get a more accurate answer, and because, on a calculator, a t procedure is just as easy as a z-procedure. Objective 3 List the assumptions necessary to use: • A normal (z) distribution of a sample mean to estimate a population mean • A t distribution of a sample mean to estimate a population mean Examples 1. What are the assumptions necessary to use: • The normal (z) distribution of a sample mean to estimate a population mean • A t distribution of a sample mean to estimate a population mean 2. Consider the following set of sample data: (34, 32, 34, 32, 48, 37, 31, 31, 29, 27).

We're interested in using this data to test a null hypothesis about the population mean. Which of the following statements are true? I. Assuming this represents a random sample from the population, the sample mean is

an unbiased estimator of the population mean. II. Because they're robust, t procedures are justified in this case. III. We'd use z-procedures, since we're interested in the population mean. A. I only B. II only C. III only D. I and II only E. I and III only

Page 4: t Distribution for Means

AP Statistics Page 4 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Tips • You may wish to use one or more of the following when you're checking data for the

conditions necessary for using a t procedure: • Modified boxplot • Histogram • Stemplot • Normal quantile plot

These plots are useful for testing whether the population from which the sample is taken is free from extreme skewness or outliers. Modified boxplots are especially helpful for checking for outliers, and they give you a rough idea of whether the distribution may be skewed. Histograms, quantile plots, and stemplots are better for checking for skewness because they give you a more detailed view of the distribution. (If you use your calculator to make a histogram, be sure to adjust your window to make sure you're getting a meaningful picture.) Note: When you're checking for outliers and skewness, you should also check to see that the mean and median are close to one another in relation to the spread of the distribution.

• If the data have no outliers, if they're not severely asymmetric, and if the mean and

median aren't far apart, you can use a t procedure.

• The smaller the sample, the less skewness you can accept. Answers 1. To use a normal distribution of a sample mean for a small sample (usually defined as n

< 30), you must assume that the population distribution is normal, and you must know the population standard deviation, σ. To use a normal distribution of a sample mean for a large sample (usually defined as n > 30), you must assume that the population is free from extreme skewness and outliers. (Note: if you don�t know the population standard deviation, your answer will be more accurate if you use a t distribution, as a normal distribution will only be a close approximation.)

To use a t distribution of a sample mean, you must assume that the population from which the sample is drawn is normal or nearly normal. At the very least, the population should be free from extreme skewness or outliers. If your sample does have an outlier, you might still be able to use a t procedure if: A. The sample size is at least 30 (this will reduce the effect of the outlier on the

sampling distribution). B. The distribution isn't markedly skewed.

2. The correct answer is: A. I only. Note that the data set has an outlier (which you can

see if you make a modified boxplot), which prevents II from being true. If the sample size had been 30 or larger, and if the outlier was closer to the rest of the distribution, a t procedure would have been OK here.

Page 5: t Distribution for Means

AP Statistics Page 5 of 21 Review: t Distribution for Means

_____________ © Copyright 2000 Apex Learning Inc. All rights reserved. This material is intended for the exclusive use of

Objective 4 Define the term robust and state how it applies to t procedures. Example Define the term robust and state how it applies to t procedures. Answer A procedure is robust if it's insensitive to the assumptions needed for its use. The t distributions are robust to the assumption of normality; even if the population distribution isn't normal, you can use a t procedure if the population is free from extreme skewness and outliers. Objective 5 Define the term degrees of freedom as it applies to using a t distribution of a single sample mean. Example Define the term degrees of freedom as it applies to using a t distribution of a single sample mean. Answer For the t distribution of a single sample mean, degrees of freedom is the sample size minus one. Objective 6 Given a value for degrees of freedom, find the critical t value for an upper-tail probability or for a confidence interval. Example What's the critical t value (t*) for an upper-tail probability of .02 with 11 degrees of freedom? What's the critical t value (t*) for a 98% confidence level with 11 degrees of freedom? Answer For 11 degrees of freedom, the t* for upper-tail probability .02 is 2.328, and the t* for a 98% confidence level is 2.718. (Both of these values were found using a t table.) Note that t* for a 98% confidence level would be the same t* as an upper-tail probability of .01. Objective 7 Construct t confidence intervals for a single mean. Example Using a sample with n = 8, x = 20.53, and s = 3.5, find a 95% and a 99% confidence interval for the mean of the population.

Page 6: t Distribution for Means

AP Statistics Page 6 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Tips • Use the formula: (estimate) ± (critical t value)(standard error). • When looking up the critical t value in a t distribution table, remember that the number

of degrees of freedom for a single mean is n � 1.

• The standard error of a sample mean is n

s.

Answer

95%: 20.53 ± 2.365(8

5.3) = (17.60, 23.46)

99%: 20.53 ± 3.499(8

5.3) = (16.20, 24.86)

Objective 8 Use the t distributions to perform hypothesis tests for single means. Examples 1. Twenty students were randomly selected from a population and given a 110-point test.

Here are their scores: 71, 93, 91, 86, 75, 73, 86, 82, 76, 57, 84, 89, 67, 62, 72, 77, 68, 65, 75, and 84. Can you use a t procedure to estimate the mean score for the population and/or to conduct a hypothesis test on this sample? Why or why not?

2. Using the data from example 1, test the hypothesis 0H : µ = 80 (the population mean is

80) against the alternative aH : µ < 80 (the population mean is less than 80). Use

α = .05. 3. For the data from example 1, find a 95% confidence interval for the population mean.

Interpret your interval.

Page 7: t Distribution for Means

AP Statistics Page 7 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Tips • You may wish to use one or more of the following when you're checking data for the

conditions necessary for using a t procedure: • Modified boxplot • Histogram • Stemplot • Normal quantile plot

These plots are useful for testing whether the population from which the sample is taken is free from extreme skewness or outliers. Modified boxplots are especially helpful for checking for outliers, and they give you a rough idea of whether the distribution may be skewed. Histograms, quantile plots, and stemplots are better for checking for skewness because they give you a more detailed view of the distribution. (If you use your calculator to make a histogram, be sure to adjust your window to make sure you're getting a meaningful picture.) Note: When you're checking for outliers and skewness, you should also check to see that the mean and median are close to one another in relation to the spread of the distribution.

• If the data have no outliers, if they're not severely asymmetric, and if the mean and

median aren't far apart, you can use a t procedure.

• The smaller the sample, the less skewness you can accept.

Page 8: t Distribution for Means

AP Statistics Page 8 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Answers 1. A modified boxplot shows that the distribution is nearly symmetric, and that there are

no outliers.

When there are no outliers, one plot is usually sufficient to justify that you can use a t distribution. Nevertheless, let's look at a quantile plot to see what it shows us:

The normal quantile plot shows a linear pattern without extreme skewness on the edges, so we're definitely justified in using a t procedure. (Note: Even if the quantile plot weren't linear, you could still use a t procedure since the data aren't severely skewed, and there are no outliers influencing your mean or standard deviation. That's because t procedures are robust.)

2. (We justified our use of a t test in this situation in example 1.)

x = 76.65, s = 10.04, n = 20, df = 19

t = 2004.10

8065.76 − = �1.49 � .05 < p < 10

(If this is done on a graphing calculator, p = .076.) This probability, while not high, isn't small enough to reject the null hypothesis at α = .05 and doesn't provide evidence that the true average score is less than 80. (The work shown above is a good model for what you should show the person who grades your quiz or test to convince them you know what you're doing).

Page 9: t Distribution for Means

AP Statistics Page 9 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

3. )35.81,95.71(20

04.10093.265.76 =��

����

�±

We're 95% confident that the true population mean lies between 71.95 and 81.35. When we say "We're 95% confident," we mean that this procedure will capture the true population mean 95% of the time.

Confidence Intervals for the Difference Between Two Means Objective 1 Distinguish between situations that should be analyzed using matched pairs single-sample procedures for means, and situations that should be analyzed using two-sample procedures for means. Examples Which of these situations should be analyzed using one-sample procedures, and which should be analyzed using two-sample procedures? 1. A researcher wants to know if people's left or right hands tend to be stronger. She

randomly selects 15 people and tests each for left- and right-hand strength by having them squeeze a measuring device first with one hand, then the other. (The order is random.)

2. A researcher wants to know if a population of brown rats in one city has a greater mean

tail length than a population in another city. She randomly selects rats from each city and measures the lengths of their tails.

3. A researcher wants to know if a new vitamin supplement will make the tails of brown

rats grow longer. She takes 50 rats and divides them into 25 pairs matched on gender and age. Within each pair, she randomly selects one rat to receive the new vitamin. After six months she measures the length of each rat's tail.

4. A researcher wants to know if iron builds up in people's blood during cold months. She

measures iron content for 50 people in June, then measures the same 50 people the following February.

5. A researcher wants to know if there's a significant difference in rainfall on July 4th

between two cities, city A and city B. She finds climate data on the two cities, and looks up the amount of rainfall on July 4th in each city for the past 30 years.

Tip Use a single-sample matched pairs procedure when the observations are in pairs (that is, when you either have two measurements taken on the same subject or when you have observations matched on at least one other variable, such as time).

Page 10: t Distribution for Means

AP Statistics Page 10 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Answers 1. Use a single-sample procedure. The hand-strength measurements are in pairs (each pair

of observations is taken from the same subject) and each pair of observations can be reduced to one number representing the difference between right- and left-hand strength.

2. Use a two-sample procedure. The data aren't in pairs. 3. Use a single-sample procedure. The tail-length measurements are in pairs (each pair of

observations is taken from the same subject) and each pair of observations can be reduced to one number representing the difference between the treatment (vitamin) and control condition.

4. Use a single-sample procedure. The observations are in pairs, with two observations per

subject. 5. Use a single-sample procedure. The observations are in pairs (they're matched by date,

July 4th). Objective 2 Construct t confidence intervals for the difference between two means using paired data. Example Consider the following data that compares the life span (in weeks) of two cat toys for five cats. Construct a 95% confidence interval for the population difference between the average life spans of Toy A and Toy B.

Cat Toy A Toy B 1 10.6 10.2 2 9.8 9.4 3 12.3 11.8 4 9.7 9.1 5 8.8 8.3

Tips • This is a matched pairs situation because, in each case, both measurements are taken

on the same cat. • To use a single-sample t procedure on matched pairs data, you must justify that both

separate data sets (all the numbers from each half of each pair) are free from outliers or extreme skewness. If the data do have outliers, you must at least justify that the outliers don't heavily influence the mean or standard deviation of your sample.

Page 11: t Distribution for Means

AP Statistics Page 11 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Answer Looking at modified boxplots of the numbers for Toy A and Toy B, we see that the samples are free of extreme skewness or outliers, so we can use a t procedure.

Since this is a matched pairs situation, we need to look at the differences between the wear times. We'll add a column to the table for these differences:

Cat Toy A Toy B D = A � B 1 10.6 10.2 .4 2 9.8 9.4 .4 3 12.3 11.8 .5 4 9.7 9.1 .6 5 8.8 8.3 .5

Dx = .48, Dxs = .084, df = 4 � t* = 2.776

.48 ± 2.776(.084/ 5 ) = (.376, .584) Since 0 isn't in the interval, we can conclude that this is good evidence that there's a significant difference between life spans of the two types of toys. Objective 3 Use t distributions to conduct hypothesis tests for the difference between two means using paired data. Example Consider once again, the cat-toy data from the previous example:

Cat Toy A Toy B 1 10.6 10.2 2 9.8 9.4 3 12.3 11.8 4 9.7 9.1 5 8.8 8.3

Perform a test of the hypotheses: 0:0 =DH µ (There's no difference in life span between the two cat toys.) 0: ≠DaH µ (One cat toy lasts longer than the other.)

Tip You don't need to specify a significance level to do a hypothesis test, but you can if you want to.

Page 12: t Distribution for Means

AP Statistics Page 12 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Answer The boxplot above shows that there are no significant departures from normalcy in the sample distributions, so we can use t procedures.

0:0 =DH µ (There's no difference in life span between the two cat toys.) 0: ≠DaH µ (One cat toy lasts longer than the other.)

Dx = .48, Dxs = .084, df = 4

78.12

5084.

048. =−=t

The probability of getting a t value of 12.78 for 4 degrees of freedom is approximately zero, so the P-value is 0. Since the P-value is so low, we can reject our null hypothesis and accept our alternative hypothesis that the mean difference in life span between each type of toy isn't 0. Apparently, one cat toy lasts longer than the other.

Page 13: t Distribution for Means

AP Statistics Page 13 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Confidence Intervals and Hypothesis Tests for Two Independent Samples Objective 1 Construct t confidence intervals for the difference between the means of two independent samples. Examples A groundskeeper wants to know if a certain variety of rosebush produces more blossoms if lawn fertilizer leeches into the rose beds. She randomly chooses a set of 18 rosebushes (of the same variety) from the nursery. She then plants half of the these next to a lawn that gets fertilized on a regular basis and plants the other half next to an unfertilized lawn. At the height of the blooming season, she counts the number of blossoms on each bush. The results are as follows:

Lawn Fertilizer

No Lawn Fertilizer

32 35 37 31 35 29 28 25 41 34 44 40 35 27 31 32 34 31

1. Calculate a 99% confidence interval for the difference in number of blossoms using:

A. The formula for a two-sample t interval B. Your calculator Explain why there's a difference between the calculator results and the formula results.

2. Interpret your result from example 1. Do the rosebushes produce more blossoms if lawn

fertilizer leeches into the soil? 3. If you could assume that the population standard deviations were equal, on how many

degrees of freedom would you base your confidence interval? Calculate the 99% confidence interval using the pooled variance.

Page 14: t Distribution for Means

AP Statistics Page 14 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Tips • To use t procedures on two independent samples, you must show that neither sample

came from a population with extreme skewness or outliers. • When you use a two-sample t procedure to estimate a difference between means, your

results will be more precise if your sample sizes are similar. • If the sample sizes add up to more than 30, they can be widely different and you'll still

have a reasonably accurate result. • The formula for a two-sample t interval for the difference between two population means

is:

( 1x � 2x ) ± t*2

22

1

21

ns

ns + .

• If you don't use a calculator to construct your confidence interval, degrees of freedom

for your t* is the lesser of ( 1n � 1) or ( 2n � 1).

• If you use your calculator to calculate the t interval, degrees of freedom will be calculated for you.

• To find the confidence interval using a pooled variance, use your calculator it's much easier.

• If the question doesn't specifically mention that you know the population standard deviations are the same, or if you aren't given any information that allows you to prove that they're the same, assume that you shouldn't pool the variances.

Answers 1. Modified boxplots of the two data sets show no outliers or extreme skewness so it is OK

to use t procedures.

A. Using the formula for a two-sample t interval, we get

1x = 35.22, 1s = 4.94, 1n = 9;

2x = 31.56, 2s = 4.48, 2n = 9, df = 9 � 1 = 8 � t* = 3.355;

(35.22 � 31.56) ± 3.355(948.4

994.4 22

+ ) = 3.66 ± 3.355(2.22)

= (�3.7881, 11.11). B. To use the TI-83/TI-84, press STAT, arrow over to TESTS, scroll down to 2 � SampTint,

press ENTER, select STATS and enter the statistics calculated above: 1x = 35.22, 1sx = 4.94, 1n = 9; 2x = 31.56, 2sx = 4.48, 2n = 9. Enter .99 for C-Level. Select NO for

Pooled? then scroll down to Calculate and press ENTER. The calculator gives you this interval: (�2.841, 10.161). The calculator gives a narrower interval because it uses a more precise estimate for degrees of freedom.

Page 15: t Distribution for Means

AP Statistics Page 15 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

2. Since 0 is in the interval, the sample difference we obtained would be a likely outcome if there were no difference between the two treatments.

3. If you could pool the variances, degrees of freedom would be 21 nn + � 2, or

9 + 9 � 2 = 16.

You can calculate the confidence interval by calculating a pooled variance, then using the formula for the standard error with a pooled variance.

However, instead of using the unwieldy formula for the pooled variance, you can use 2 � SampTint on your calculator (see instructions in the answer for question 1) and select YES for Pooled?. The interval, using a pooled variance, will be (�2.833, 10.153). Here are the formulas in case you want to see how you'd do this without a calculator. If the two population standard deviations are the same and you can pool the variances, the confidence interval would be given by

)( 21 xx − ± t*21

11nn

s + , where 2

)1()1(

21

222

211

−+

−+−=

nn

snsns with 21 nn + � 2 degrees

of freedom.

If you use your calculator on a quiz or exam to construct a confidence interval, you'll need to provide details that show you know what you're doing. You can do this by writing the formula

)( 21 xx − ± t*21

11nn

s + ,

then providing the values that your calculator gave:

)( 21 xx − ± t*21

11nn

s + ,

3.66 ± 2.921(4.71561237)21

11nn

+ ,

(�2.833, 10.153).

Page 16: t Distribution for Means

AP Statistics Page 16 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Objective 2 Perform hypothesis tests for the difference between the means of two independent samples. Examples 1. For the data from the previous example, perform a test of the null hypothesis that the

bushes don't produce more blossoms when lawn fertilizer leeches into their soil. For the alternative, state that rosebushes next to fertilized lawns will produce more blossoms.

Lawn

Fertilizer No Lawn Fertilizer

32 35 37 31 35 29 28 25 41 34 44 40 35 27 31 32 34 31

2. If you could assume that the population standard deviations were equal, how many

degrees of freedom would you base your test statistic on? Calculate the test statistic using the pooled variance, and state whether your conclusion would change from what you concluded in example 1.

Tips • Remember that there are four distinct things you must do to get full credit for a

hypothesis test: 1. State your null and alternative hypotheses in the context of the problem. 2. State the test you'll use and check the assumptions needed to use the test. 3. Calculate a test statistic and a P-value. 4. State your conclusion.

• If the question doesn't specifically mention that you know the population standard deviations are the same, or if you aren't given any information that allows you to prove that they're the same, assume that you shouldn't pool the variances.

Answers 1. 0H : 1µ = 2µ (There's no difference in the number of blossoms.)

21: µµ >aH (Rosebushes next to fertilized lawns will produce more blossoms.)

For this problem, we'll use a one-sided t test at the .01 level of significance.

Box plots of the two data sets show no outliers or skewness so we can use t procedures.

1x = 35.22, 1s = 4.94, 2x = 31.56, 2s = 4.48

22.266.3

948.4

994.4

56.3122.3522

=

+

−=t = 1.65, df = 9 � 1 = 8 � .05 < p < .10

(Note: if you use a graphing calculator, you'll get p = .06, df = 15.85)

Page 17: t Distribution for Means

AP Statistics Page 17 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

This P-value isn't sufficiently low to reject the null hypothesis at the .01 level. We don't have good evidence that rosebushes next to fertilized lawns produce more blossoms.

2. If you could pool the variances, the degrees of freedom would be 21 nn + � 2, or

9 + 9 � 2 = 16. You can calculate the test statistic using the pooled standard deviation. First, you'll need to calculate the pooled standard deviation using

( ) ( )2

11

21

222

211

−+−+−

=nn

snsnsp , then plug your value for sp into

21

21

11nn

s

xxt

p +

−= .

Instead of using the unwieldy formula for the pooled variance, you can use 2 � SampTTest on you calculator: Press STAT, arrow over to TESTS, scroll down to 2 � SampTTest, press ENTER, select STATS and enter the statistics calculated earlier:

9,48.4,56.31;9,94.4,22.35 222111 ====== nsxxnsxx . Since this is a one-

sided test where the alternative is higher than the null, select µµµµ1 > µµµµ2. Select YES for Pooled? then scroll down to Calculate and press ENTER. The test statistic, using a pooled variance, is t = 1.646 (df = 16), with a P-value of .059. (The P-value is still to high enough for us to reject the null.) If you use your calculator to calculate the test statistic on a quiz or exam, you'll need to demonstrate your understanding of the process by writing down the formula for the pooled t statistic and explain the values you will plug in to it:

The two-sample t statistic for a pooled variance is

21

21

11nn

s

xxt

p +

−= , where

( ) ( )2

11

21

222

211

−+−+−

=nn

snsnsp with 221 −+ nn degrees of freedom.

Objective 3 Describe the three different ways degrees of freedom can be computed when you're using t distributions to estimate the difference between the means of two independent samples. Examples 1. Identify the three different techniques for determining the number of degrees of

freedom in a t procedure for the difference between the means of two independent samples. In what situations would you use each technique?

2. What condition must be met if you want to use pooled variances? 3. Suppose we want to compare two groups and have arrived at the following summary

data:

Group n x s 1 9 23 4 2 6 18 5

Page 18: t Distribution for Means

AP Statistics Page 18 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Identify the number of degrees of freedom involved for each of the three methods described in your answer for example 1.

Answers 1.

• The conservative method (unpooled): df = min( 1,1 21 −− nn ). Use this method

when you aren't using a calculator or a computer to generate your test statistic or calculate your interval.

• The pooled method: df = 221 −+ nn . Use this method when you know that the

population standard deviations are the same and that you can pool the variances of your samples. (This is almost never the case.)

• The software (or graphing calculator) method: For this method, you use 2 � SampTint or 2 � SampTTest on your calculator (or an analogous function on a computer) to generate your test statistic or calculate your interval. This method gives you a value for degrees of freedom that's more precise than the conservative method, but a little less precise than the pooled method.

2. You can use the pooled method when you can reasonably assume that the two

population standard deviations are the same. 3.

• Conservative method: df = 6 � 1 = 5. For this method use the lesser of either n1 � 1 or n2 � 1.

• Pooled method: df = 9 + 6 � 2. For this method use n1 + n2 � 2. • Software method: df = 9.137. In this method, the calculator or computer gives you

the degrees of freedom when it calculates the confidence interval or test statistic. Summary of Formulas Although you'll be provided with a formula sheet you can use on the unit quiz, it's very important that you understand what these formulas mean, and that you understand when to use them. Some of the formulas here include extra notes explaining their use, what the different symbols mean, or how to calculate degrees of freedom, for example. You won't be given this information on the unit quiz. In some cases, what you see here may differ from the formula sheet, so you'll need to understand the formulas well enough to be able to adapt them. For example, if you needed to look up the standard error of a sample mean on the formula sheet,

you might only find the formula for the standard deviation of a sample mean: n

σ.

In this case, you'd need to recognize that you'd use the same formula, but in a

slightly different form: (n

s).

If you use these formulas as you study and understand what they mean and where they come from, you should have no trouble using them on a quiz or exam. (By the time you've used the formulas enough and understand them well, you'll probably find you've memorized most of them anyway.)

Page 19: t Distribution for Means

AP Statistics Page 19 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

t interval for a single population mean, or for the difference between population means in a matched pairs situation:

x ± t*n

s

Note: You'll need to know that the general form for a confidence interval is (estimate) ± (critical value)(standard error or standard deviation of the estimate), then know what estimate or formula to plug in for each element. One-sample t statistic (for calculating a P-value or doing a hypothesis tests for a single population mean):

t =

ns

x 0µ−,

where µ0 is the hypothesized value for the mean. Note: You'll need to know that a test statistic is usually constructed in this way:

error standardor deviation standardmeasures two between difference

.

You'll also need to know what to plug in for the numerator and denominator. The elements will be given to you on the formula sheet, but you'll have to know how they fit together. Two-sample t interval for the difference between two population means:

( 1x � 2x ) ± t*2

22

1

21

ns

ns +

If you calculate the interval by hand and you don't pool the variances (it's rarely a good idea to pool the variances), degrees of freedom is the lesser of (n1 � 1) or (n2 � 1). Note: You'll need to know that the general form for a confidence interval is (estimate) ± (critical value)(standard error or standard deviation of the estimate), and you'll also need the estimates or formulas to plug in for each element. You'll also need to know from memory how to calculate degrees of freedom. If you use a calculator to find the interval, degrees of freedom will be calculated for you and the estimate will be more precise. If you do want to pool the variances, you can use a calculator.

Page 20: t Distribution for Means

AP Statistics Page 20 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

Confidence interval for pooled variance

)( 21 xx − ± t*21

11nn

s + where 2

)1()1(

21

222

211

−+

−+−=

nn

snsns with n1 + n2 � 2 degrees of

freedom. Two-sample t statistic for a two-sample t test for the difference between two population means (for calculating a P-value and conducting a hypothesis test):

2

22

1

21

21

n

s

n

s

xxt

+

−=

Note: You'll need to know that a test statistic is usually constructed this way:

error standardor deviation standardmeasures two between difference

.

You'll also need to know what to plug in for the numerator and denominator. The elements will be given to you on the formula sheet, but you'll have to know how they fit together. For example, in this case the denominator is the same standard error you saw for the two-sample t interval. The two-sample t statistic for a pooled variance:

21

21

11nn

s

xxt

p +

−= where

( ) ( )2

11

21

222

211

−+−+−

=nn

snsnsp with n1 + n2 � 2 degrees of

freedom.

Page 21: t Distribution for Means

AP Statistics Page 21 of 21 Review: t Distribution for Means

_____________ Copyright © 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

About the Unit Quiz What to Bring • Scratch paper • Calculator • Approved formula sheet • Approved tables You can't have any reference materials other than those specifically mentioned above. You won't be able to ask for help during the quiz. Hints and Tips for the Free-Response Portion • Show your work. The test corrector won't assume you used proper set up and

methods if you reach the correct answer. It's up to you to communicate the methods that you used. Answers alone, without appropriate justification, will receive no credit.

• Take your time reading the question. Since we want to see how well you can apply

your knowledge to new and somewhat unfamiliar situations, take some time to think about the question. If you don't understand the question, you're unlikely to find the right answer. Read the entire question before beginning to answer.

• Most questions will be given in several parts. The answers from one section will

often be used in subsequent sections. Missing points in an early section does not mean you'll lose points in subsequent sections. Again, read the entire question to see how the different sections connect to each other.

• The calculator. As in the AP Exam, this quiz will test you on how well you know

statistics, not on how well you can use your calculator. Be sure you understand the concepts behind the calculator operations. Don't use "calculator-speak" in your answer�the instructor doesn't want to read a set of steps for the calculator! Use your calculator for doing the mechanics, but be sure to clearly communicate your process for solving the problem.

• Use Units. If units are given in the problem, make sure that you give them in your

answer. • Answer the Question. Finally, be very careful to answer the question asked. Before

you move on, read over your answer to make sure you're providing exactly what the question asks for. Generally, an answer to a question you weren't asked will receive no credit.