6.2 Large Sample Significance Tests for a Mean

40
6.2 Large Sample Significance Tests for a Mean “The reason students have trouble understanding hypothesis testing may be that they are trying to think.” Deming

description

6.2 Large Sample Significance Tests for a Mean. “The reason students have trouble understanding hypothesis testing may be that they are trying to think.” Deming. In a law case, there are 2 possibilities for the truth—innocent or guilty - PowerPoint PPT Presentation

Transcript of 6.2 Large Sample Significance Tests for a Mean

Page 1: 6.2 Large Sample Significance Tests for a Mean

6.2 Large Sample Significance Tests for a Mean

“The reason students have trouble understanding hypothesis testing may be that they are trying to think.” Deming

Page 2: 6.2 Large Sample Significance Tests for a Mean

• In a law case, there are 2 possibilities for the truth—innocent or guilty

• Evidence is gathered to decide whether to convict the defendant. The defendant is considered innocent unless “proven” to be guilty “beyond a reasonable doubt.” Just because a defendant is not found to be guilty doesn’t prove the defendant is innocent. If there is not much evidence one way or the other the defendant is not found to be guilty.

Page 3: 6.2 Large Sample Significance Tests for a Mean

For Statistical Hypothesis Testing, We have 2 possibilities to choose from,

• H0=Null hypothesis (innocent) Held on to unless there is sufficient evidence

to the contrary• Ha=Alternative hypothesis (guilty) We reject H0 in favor of Ha if there is enough

evidence favoring Ha

Page 4: 6.2 Large Sample Significance Tests for a Mean

𝐻0, null hypothesis 𝐻𝑎, alternative hypothesis

In the end we “reject the null hypothesis” or “fail to reject the null hypothesis.”

In order to reject 𝐻0 we need sufficient evidence. The burden of proof is on disproving 𝐻0. “Innocent until proven guilty”

Page 5: 6.2 Large Sample Significance Tests for a Mean

Tests of Hypotheses

• Distribution(s) or population(s):• Parameter(s) such as mean and variance• Assertion or conjecture about the population(s) –

statistical hypotheses

1. About parameter(s): means or variances 2. About the type of populations: normal , binomial, or …

Page 6: 6.2 Large Sample Significance Tests for a Mean

Example• Is a coin balanced? This is the same as to ask if p=0.5

• Is the average lifetime of a light bulb equal to 1000 hours?

The assertion is μ=1000

Page 7: 6.2 Large Sample Significance Tests for a Mean

Null Hypotheses and Alternatives

• We call the above two assertions Null HypothesesNotation: H0: p=0.5 and H0:μ=1000

If we reject the above null hypotheses, the appropriate conclusions we arrive are called alternative hypotheses Ha: p0.5 Ha: μ1000

Page 8: 6.2 Large Sample Significance Tests for a Mean

Null Hypothesis vs Alternative• H0: p=0.5 vs Ha: p0.5

• H0:μ=1000 vs Ha: μ1000

• It is possible for you to specify other alternatives• Ha: p>0.5 or Ha: p<0.5

• Ha: μ>1000 or Ha: μ<1000

Page 9: 6.2 Large Sample Significance Tests for a Mean

Example: Discharge limit = 2 ppm

Possibly 𝐻0: 𝜇= 2ppm ሺor maybe μ> 2𝑝𝑝𝑚ሻ 𝐻𝑎: 𝜇< 2ppm ⇐burden of proof

Possibly 𝐻0: 𝜇= 2ppm ሺor maybe μ< 2𝑝𝑝𝑚ሻ 𝐻𝑎: 𝜇> 2ppm ⇐burden of proof

Possibly 𝐻0: 𝜇= 2ppm 𝐻𝑎: 𝜇≠ 2ppm ⇐burden of proof The null hypothesis (in this book) always has =. Set 𝐻𝑎first. Then 𝐻0 is =.

Page 10: 6.2 Large Sample Significance Tests for a Mean

Significance Testing /Hypothesis Testing

• A company claims its light bulbs last on average 1000 hours. We are going to test that claim.

• We might take the null and alternative hypotheses to be

H0:μ=1000 vs Ha: μ1000or may be

H0:μ=1000 vs Ha: μ<1000

Page 11: 6.2 Large Sample Significance Tests for a Mean

Mistakes or errors:

• Law case—convict an innocent defendant; or fail to convict a guilty defendant.

• The law system is set up so that the chance of convicting an innocent person is small. Innocent until “proven guilty” beyond a reasonable doubt.

Page 12: 6.2 Large Sample Significance Tests for a Mean

Two Types of Errors in statistical testing

• Type I error -- reject H0 when it is true (convict innocent person)

• Type II error -- accept H0 when it is not true (find guilty person innocent)

Page 13: 6.2 Large Sample Significance Tests for a Mean

Statistical hypotheses are set up to

• Control type I error =P(type I error) =P(reject H0 when H0 true)

(a small number)

• Minimize type II error =P(type II error)

=P(accept H0 when H0 false)

Page 14: 6.2 Large Sample Significance Tests for a Mean

Control Types of Errors

• In practice, is set at some small values, usually 0.05

• If you want to control at some small values, you need to figure out how large a sample size (n) is required to have a small also.

• 1- is called the power of the test• 1- =Power=P(reject H0 when H0 false)

Page 15: 6.2 Large Sample Significance Tests for a Mean

Example• X=breaking strength of a fish line, normal

distributed with σ=0.10. • Claim: mean is =10 • H0: =10 vs HA: 10

A random sample of size n=10 is taken, and sample mean is calculated• Accept H0 if • Type I error?• Type II error when =10.10?

05.1095.9 x

Page 16: 6.2 Large Sample Significance Tests for a Mean

Solution• Type I error=P(reject H0 when =10)

( 9.95) ( 10.05)9.95 10 10.05 10( ) ( )0.10 / 10 0.10 / 10

( 1.58) ( 1.58)2(.0571)0.1142

P x P x

P Z P Z

P Z P Z

Page 17: 6.2 Large Sample Significance Tests for a Mean

Solution

• Type II error=P(accept H0 when H0 false)

• Power=1-0.0571=0.9429

(9.95 10.05)9.95 10.05( )

0.10 / 10 0.10 / 109.95 10.10 10.05 10.10( )0.10 / 10 0.10 / 10

( 4.74 1.58)0.0571

P x

P Z

P Z

P Z

Page 18: 6.2 Large Sample Significance Tests for a Mean

Tests concerning Means

• 5 steps to set up a statistical hypothesis test

Page 19: 6.2 Large Sample Significance Tests for a Mean

Steps (p. 350)• Steps 1 and 2: State the null and alternative

hypothesis.• Step 3: State the test criteria. That is, give the

formula for the test statistic (plugging in only the hypothesized value from the null hypothesis but not any sample information) and the reference distribution. Then state in general terms what observed values of the test statistic constitute evidence against the null hypothesis.

Page 20: 6.2 Large Sample Significance Tests for a Mean

• Step 4: Show the sample based calculations.

• Step 5: Report an observed level of significance, p-value, and (to the extent possible) state its implications in the context of the real engineering problem.

Page 21: 6.2 Large Sample Significance Tests for a Mean

Example: Bearings are intended to be 5 cm in diameter.

Steps 1 and 2: State the null and alternative hypothesis.

The null hypothesis in this book always has =.

𝐻0: 𝜇= 5,𝐻𝑎: 𝜇≠ 5 or maybe 𝐻0: 𝜇= 5,𝐻𝑎: 𝜇< 5

Step 3: “State the test criteria.” The test statistic is some statistic for which we can find probabilities such as z value or t value

𝑍= 𝑥ҧҧ−𝜇𝜎 ξ𝑛Τ

And we can find from the table, for example, 𝑃(𝑍> 2.8) ≈ 0.0026

The reason is that we want to quantify how unlikely the data are if 𝐻0 is true. If it’s unlikely to have seen this kind of data if 𝐻0 is true, then we reject 𝐻0.

Page 22: 6.2 Large Sample Significance Tests for a Mean

“Give the formula for the test statistic.”

Suppose we are taking a sample of n=100 ball bearings.

Test statistic 𝑍= 𝑋ത−5𝜎 ξ100Τ

The reference distribution is standard normal.

“Then state in general terms what observed values of the test statistic constitute evidence against the null hypothesis.”

Since 𝐻𝑎: 𝜇≠ 5, 𝑍 values far from 0 in either direction are good evidence against 𝐻0: 𝜇= 5. If 𝐻𝑎: 𝜇> 5, 𝑍 values far from 0 in a positive direction are good evidence against 𝐻0: 𝜇= 5. If 𝐻𝑎: 𝜇< 5, 𝑍 values far from 0 in a negative direction are good evidence against 𝐻0: 𝜇= 5.

Page 23: 6.2 Large Sample Significance Tests for a Mean

Step 4: Show the sample based calculations.

Suppose our sample data turn out to be

𝑛 = 100 𝑥ҧ= 5.006 𝜎≈ 0.05 𝜎ξ𝑛 ≈ 0.005

Then 𝑧= 𝑥ҧ−50.005 = 0.0060.005 = 1.2

Page 24: 6.2 Large Sample Significance Tests for a Mean

Step 5: Report the p-value or level of significance.

p-value: probability of this much or more evidence against 𝐻0 if 𝐻0 is true.

p-value = 2× 0.1151 = 0.2302

Page 25: 6.2 Large Sample Significance Tests for a Mean

• Interpret the Results

If the p-value is small, This type of data are unlikely if H0 is true.The fact that we are looking at this data set right now indicates that H0 is likely not true.The null hypothesis looks bad reject H0 .

Page 26: 6.2 Large Sample Significance Tests for a Mean

• The p-value is the probability of a result at

least as extreme (away from what the null hypothesis would have predicted) if in fact the null hypothesis is true.

• So if the data are extremely unlikely when the null hypothesis is true, – The p-value is small and– The null hypothesis looks bad.

Page 27: 6.2 Large Sample Significance Tests for a Mean

In the example, if 𝐻0: 𝜇= 5 is true and 𝐻𝑎:𝜇≠ 5. P-value = 0.23 There is a 23% chance of getting data this far from

expected if 𝐻0 is true, 𝜇= 5. This resulting data are not extremely unlikely if 𝜇= 5,

o So the evidence against 𝜇= 5 is not overwhelming. o The data are not obviously inconsistent with 𝐻0. o We do not reject 𝐻0: 𝜇= 5.

Page 28: 6.2 Large Sample Significance Tests for a Mean

If, however, suppose the sample mean is 𝑥ҧ= 5.014

𝑧= 𝑥ҧ−50.005 = 5.014−50.005 = 0.0140.005 = 2.8

The p-value is now 2× 0.0026 = 0.0052 Very good evidence against 𝐻0. We would have a very small chance (≈ 5 1000Τ ) of observing this sort of data if 𝐻0 is true.

Page 29: 6.2 Large Sample Significance Tests for a Mean

If p-value < 0.05, this is commonly considered “statistically

significant” evidence against the null hypothesis at the 𝛼= 0.05 level.

is How small the p-value has to be in order to

reject 𝐻0 The chance that we are willing to take of

rejecting 𝐻0 when 𝐻0 is in fact true. The chance of convicting an innocent person or

innocent 𝐻0 .

Page 30: 6.2 Large Sample Significance Tests for a Mean

One-sided Alternatives

If we had 𝐻0:𝜇= 5,𝐻𝑎:𝜇< 5 and

𝑧= 𝑥ҧ−50.005 = 5.014−50.005 = 0.0140.005 = 2.8

the p-value would be 1-0.0026 = 0.9974. Only very negative values of z, 𝑥ҧ far enough below 5,

constitute evidence against 𝐻0. We have no indication that 𝐻0 is false, so the p-value

is large. Small p-values are bad for 𝐻0

o because this sort of data are unlikely if 𝐻0 is true.

Page 31: 6.2 Large Sample Significance Tests for a Mean

If we had 𝐻0:𝜇= 5,𝐻𝑎:𝜇> 5 and

𝑧= 𝑥ҧ−50.005 = 0.0140.005 = 2.8

the p-value would be 0.0026 . Only very positive values of z constitute evidence

against 𝐻0. We have good indication that 𝐻0 is false, so the p-

value is small. Small p-values are bad for 𝐻0

o because then this sort of data are unlikely if 𝐻0 is true.

Page 32: 6.2 Large Sample Significance Tests for a Mean

• P-values and hypothesis testing are widely used. • However, in my opinion and some others’

opinions (see author’s comments later in the chapter), more often than not, such significance tests are not useful summaries. See Deming quote earlier.

• Generally, confidence intervals are more useful summaries.

Page 33: 6.2 Large Sample Significance Tests for a Mean

𝑛 = 100 𝑥ҧ= 5.006 𝜎≈ 0.05 𝜎ξ𝑛 ≈ 0.005

5.006± 1.96(𝑆𝐸) 5.006± 1.96(0.005) 5.006± 0.0098

4.996 to 5.016.

A mean diameter of 5.0 is included in these values, So it’s one of the potentially likely values for the mean. A null hypothesis of is not rejected.\

o For a two-sided alternative o Using this 2-sided confidence interval

But the mean could be as high as 5.016. Whether the mean is close enough to 5.0 for our purposes

depends on the engineering considerations.

Page 34: 6.2 Large Sample Significance Tests for a Mean

Likely the real question isn’t, “Is the mean equal to 5.0?” For one thing, we know the mean isn’t exactly

5.000000. In addition, it doesn’t do us much good if the average

is 5.0 if the bearing diameters are really variable. We are likely more concerned how often the bearings

are meeting specifications. o Consideration of the purposes for the data.

Page 35: 6.2 Large Sample Significance Tests for a Mean

Often We know the null hypothesis is false. The real question isn’t whether the null hypothesis is true.

Suppose we are comparing our standard glues and a new glue for plywood.

Our question isn’t really whether 𝜇1 = 𝜇2. We don’t just want to know if the two glues are equivalent. We want to know how different the strength of plywood is

using the two glues, o not just whether there is a difference.

Is the improvement worth the cost?

A confidence interval for 𝜇𝑛𝑒𝑤 − 𝜇𝑜𝑙𝑑 would be useful in addressing this question.

Page 36: 6.2 Large Sample Significance Tests for a Mean

Two-sided alternative ↔ two-sided confidence interval Reject 𝐻𝑎:𝜇= 5 in favor of 𝐻𝑎:𝜇≠ 5 if

o Two-sided interval fails to include 𝜇= 5. One-sided alternative ↔ One-sided confidence interval

Reject 𝐻𝑎:𝜇= 5 in favor of 𝐻𝑎:𝜇> 5 if o Lower bound interval fails to include 𝜇= 5. o The interval is completely above 5. o We have evidence that 𝜇> 5.

Reject 𝐻𝑎:𝜇= 5 in favor of 𝐻𝑎:𝜇< 5 if o Upper bound bound interval fails to include 𝜇= 5. o The interval is completely below 5. o We have evidence that 𝜇< 5.

Page 37: 6.2 Large Sample Significance Tests for a Mean

What if the confidence interval is wider than we want? We want to know more precisely how different the glues

are. We need a smaller standard error. Either

o Increase sample size, n. o Decrease variance,

Strategies such as pairing or blocking are one way of reducing variability in comparisons.

Controlling variables better Better, more precise measurements Strategies from Chapter 2.

Page 38: 6.2 Large Sample Significance Tests for a Mean

ExerciseGiven that n=25, =100, and sample mean is

1050,

1. Test the hypotheses H0: =1000 vs HA: <1000 at level =0.05.

2. Test the hypotheses H0: =1000 vs HA: ≠1000 at level =0.05.

Page 39: 6.2 Large Sample Significance Tests for a Mean
Page 40: 6.2 Large Sample Significance Tests for a Mean

Solution

1.

2. .Hreject not Do .

9938.0)5.2(

5.22050

251001000

0

valuepzPvaluep

xz

.HReject .0124.00062.0*2)5.2()5.2(

5.22050

251001000

0

valuepzPzPvaluep

xz

More evidence against H0 is smaller values of z

Evidence against H0 is z values away from 0 in either direction