BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

27
BPS - 3rd Ed . Chapter 16 1 Chapter 16 Inference about a Population Mean

Transcript of BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

Page 1: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 1

Chapter 16

Inference about a Population Mean

Page 2: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 2

Conditions for Inferenceabout a Mean

Data are a SRS of size n Population has a Normal distribution

with mean and standard deviation – Both and are unknown

Because is unknown, we cannot use z procedures to infer µ– This is a more realistic situation than in

prior chapters

Page 3: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 3

When we do not know population standard deviation , we use sample standard deviation s to calculate the standard deviation of x-bar, which is now equal to

Standard Error

sn

This statistic is called the standard error of the mean

Page 4: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 4

When we use s instead of our “one-sample z statistic” becomes a one-sample t statistic

The t test statistic does NOT follow a Normal distribution it follows a t distribution with n – 1 degrees of freedom

One-Sample t Statistic

ns

μxt

μxz 00

Page 5: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 5

The t Distributions The t density curve is similar to the standard

Normal curve (symmetrical, mean = 0, bell-shaped), but has broader tails

The t distribution is a family of distributions– Each family member is identified by its degree

of freedom

– Notation t(k) means “a t distribution with k degrees of freedom”

Page 6: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 6

The t Distributions

As k increases, the t(k) density curve approaches the Z curve more closely.

This is because s becomes a better estimate of as the sample size increases

Page 7: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 7

Using Table C

Table C gives t critical values having upper tail probability p along with corresponding confidence level C

The bottom row of the table applies to z* because a t* with infinite degrees of freedom is a standard Normal (Z) variable

Page 8: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 8

Using Table C

t* = 2.365 (for 7 df)

This t table below highlights the t* critical value with probability 0.025 to its right for the t(7) curve

Page 9: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 9

where t* is the critical value for confidence level C from the t(n – 1) density curve.

Take a SRS of size n from a population with unknown mean and unknown standard deviation .

A level C confidence interval for is given by:

One-Sample t Confidence Interval

nstx

Page 10: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 10

Illustrative exampleAmerican Adult Heights

inches 70.5 to 63.9

3.367.28

3.92.36567.2

n

stx

A study of 8 American adults from a SRS yields an average height of = 67.2 inches and a standard deviation of s = 3.9 inches. A 95% confidence interval for the average height of all American adults () is:

x

“We are 95% confident that the average height of all American adults is between 63.9 and 70.5 inches.”

Page 11: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 11

The P-value of t is determined from the t(n – 1) curve via the t table.

The t test is similar in form to the z test learned earlier. The test statistic is:

One-Sample t Test

ns

μxt 0

Page 12: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 12

P-value for Testing Means Ha: > 0

P-value is the probability of getting a value as large or larger than the observed test statistic (t) value.

Ha: < 0 P-value is the probability of getting a value as small or

smaller than the observed test statistic (t) value.

Ha: 0 P-value is two times the probability of getting a value as

large or larger than the absolute value of the observed test statistic (t) value.

Page 13: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 13

Page 14: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 14

Weight Gain

Illustrative example

To study weight gain, ten people between the age of 30 and 40 determine their change in weight over a 1-year interval. Here are their weight changes:

2.0 0.4 0.7 2.0 -0.4 2.2 -1.3 1.2 1.1 2.3

Are these data statistically significant evidence that people in this age range gain weight over a given year?

Page 15: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 15

1. Hypotheses: H0: = 0Ha: > 0

2. Test Statistic: (df = 101 = 9)

3. P-value:P-value = P(T > 2.70) = 0.0123 (using a computer)P-value is between 0.01 and 0.02 since t = 2.70 is between t* = 2.398 (p = 0.02) and t* = 2.821 (p = 0.01) (Table C)

4. Conclusion:Since the P-value is smaller than = 0.05, there is significant evidence that against the null hypothesis.

2.70

101.196

01.020

ns

μxt

Weight Gain Illustration

Page 16: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 16

Weight GainIllustrative example

Page 17: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 17

Matched Pairs t Procedures Matched pair samples allow us to

compare responses to two treatments in paired couples

Apply the one-sample t procedures to the observed differences within pairs

The parameter is the mean difference in the responses to the two treatments within matched pairs in the entire population

Page 18: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 18

Air Pollution

Case Study – Matched Pairs

Pollution index measurements were recorded for two areas of a city on each of 8 days

To analyze, subtract Area B levels from Area A levels. The 8 differences form a single sample.

Are the average pollution levels the same for the two areas of the city?

Day Area A Area B A – B

1 2.92 1.84 1.08

2 1.88 0.95 0.93

3 5.35 4.26 1.09

4 3.81 3.18 0.63

5 4.69 3.44 1.25

6 4.86 3.69 1.17

7 5.81 4.95 0.86

8 5.55 4.47 1.08

These 8 differences have = 1.0113 and s = 0.1960.

x

Page 19: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 19

1. Hypotheses: H0: = 0Ha: ≠ 0

2. Test Statistic: (df = 81 = 7)

3. P-value:P-value = 2P(T > 14.594) = 0.0000017 (using a computer)P-value is smaller than 2(0.0005) = 0.0010 since t = 14.594 is greater than t* = 5.408 (upper tail area = 0.0005) (Table C)

4. Conclusion:Since the P-value is smaller than = 0.001, there is strong evidence (“highly signifcant”) that the mean pollution levels are different for the two areas of the city.

14.594

80.1960

01.01130

ns

μxt

Case Study

Page 20: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 20

1.1752 to 0.8474

0.16391.01138

0.19602.3651.0113

n

stx

Find a 95% confidence interval to estimate the difference in pollution indexes (A – B) between the two areas of the city. (df = 81 = 7 for t*)

We are 95% confident that the pollution index in area A exceeds that of area B by an average of 0.8474 to 1.1752 index points.

Case StudyAir Pollution

Page 21: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 21

Conditions for t procedures

1. Valid data?

2. Unconfounded comparison? (lurking variables are balanced)

3. Simple random sample?

4. Normal distribution?

Weight Gain Illustration

Except in the case of small samples, conditions 1 –3 are of more importance than condition 4 (t procedures are

“robust” to violations in condition 4).

Page 22: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 22

Robustness of Normality Assumption The t confidence interval and test are exact

when the distribution of the population is Normal

However, no real data are exactly Normal A confidence interval or significance test is

robust when the confidence level or P-value does not change very much when the conditions for use of the procedure are violated

t statistics are robust when n is moderate to large

Page 23: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 23

t Procedure Robustness

Sample size less than 15: Use t procedures if the data appear about Normal (symmetric, single peak, no outliers). If the data are skewed or if outliers are present, do not use t.

Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness in the data.

Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n ≥ 40.

Page 24: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 24

Air Pollution Illustration

Assessing Normality

Recall the air pollution illustration Is it reasonable to assume the

sampling distribution of xbar is Normal?

While we cannot judge Normality from just 8 observations, a stemplot of the data (right) shows no outliers, clusters, or extreme skewness.

Thus, the t test will be accurate.

0 6 8 9

1 1 1 1 2 2

Page 25: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 25

Can we use t?

The stemplot shows a moderately sized data set (n = 20) with a strong negative skew.

We cannot trust t procedures in this instance

Page 26: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 26

Can we use t?

This histogram shows the distribution of word lengths in Shakespeare’s plays. The sample is very large.

The data has a strong positive skew, but there are no outliers. We can use the t procedures since n ≥ 40

Page 27: BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.

BPS - 3rd Ed. Chapter 16 27

Can we use t?

This histogram shows the heights of college students. The distribution is close to Normal, so we can trust the

t procedures for any sample size.