Tests of Significance - Mr. Song's...

Tests of Significance

And Using the Significance Tests

Example: A Great Free-Throw Shooter

The one with Zach, Cecilee and a basketball.

Zach claims that he makes 80% of his free throws. To test his claim, Cecilee asks him to shoot 20 free throws. Zach makes only 8 out of the 20. Cecilee says, “Someone who makes 80% of his free throws would almost never make only 8 out of 20. So I don’t believe your claim.”

Cecilee thought about what would happen if Zach’s claim were true and he repeated the sample of 20 free throws many times – Zach would almost never make as few as 8. This outcome is so unlikely that it gives strong evidence that Zach’s claim is not true.

Cecilee even finds the probability that Zach would make 8 out of 20 free throws if he really makes 80% in the long run. This probability is 0.0001. The small probability convinces her that Zach’s claim is fake.

Basic Idea of Significance Test

• An outcome that would rarely happen if a claim were true is good evidence that the claim is not true.

Example 10.9 Sweetening cola

• Diet cola uses artificial sweeteners to avoid sugar. These sweeteners gradually lose their sweetness over time. Trained tasters score the cola on a “sweetness scale” of 1 to 10. The cola is then stored for a month at high temperature to imitate the effect four months’ storage at room temperature. This is a matched pair experiment. Our data are the differences (before minus after storage) in the tasters’ scores.

Here are the sweetness losses for a new cola as measured by 10 trained tasters.

The average sweetness loss for our cola is given by the sample mean, 𝑥 = 1.02.

Does the sample result 𝑥 = 1.02 reflect a real loss of sweetness?

Or

Could we easily get the outcome 𝑥 = 1.02 just by chance?

The significance test starts with a careful statement of these alternatives.

2.0 0.4 0.7 2.0 -0.4 2.2 -1.3 1.2 1.1 2.3

Significance Test

1. Draw conclusion about some parameter of the population

2. State the Null and alternative hypotheses

3. Calculate statistics like 𝑥 and decide if it is far from parameter

4. Find the P-value

5. Interpretation

Step 1: Parameter

First, we draw conclusions about some parameter. In this case, the population mean µ.

The mean µ is the average loss in sweetness that a very large number of tasters would detect in the cola. Our 10 tasters are a sample from this population.

Step 2: State the Null

• The null hypothesis says that there is no effect or no change in the population. If the null hypothesis is true, the sample result is just chance at work.

• The null hypothesis says that the cola does not lose sweetness (no change).

• 𝐻0: 𝜇 = 0

Step 2: Alternative Hypothesis

• The alternative to “no change” described by the null.

• The alternative hypothesis says that the cola does lose sweetness.

• 𝐻𝑎: 𝜇 > 0

• Suppose the null hypothesis is true (𝜇 = 0). Is the sample outcome 𝑥 = 1.02 surprisingly large under that supposition? If it is, that’s evidence against 𝐻0 and in favor of 𝐻𝑎.

• Suppose further that we know that individual tasters’ scores vary according to a normal distribution and that the standard deviation is 𝜎 = 1.

Step 3: Calculate Statistics

• The sampling distribution of 𝑥 from 10 tasters is then normal with mean 𝜇 = 0 and standard

deviation 𝜎

𝑛=

1

10= 0.316.

• The taste test for our cola produced 𝑥 = 1.02. That is way out on the normal curve, so far out that an observed value this large would almost never occur just by chance if the true µ were 0.

Figure 10.10

Step 4: P – value

• We measure the strength of the evidence against 𝐻0 by the probability under the normal curve to the right of the observed 𝑥 .

• This probability is called the P – value. It is the probability of a result at least as far out as the result we actually got. The lower this probability, the more surprising our result, and the stronger the evidence against the null hypothesis.

Let’s say there is a new cola. Our 10 tasters gave 𝑥 = 0.3. the probability to the right of 0.3 (the P-value) is 0.17. That is 17% of all samples would give a mean score as large or larger than 0.3 just by chance when the true population mean is 0. An outcome this likely to occur just by chance is not good evidence against the null hypothesis.

Our cola showed a larger sweetness loss, 𝑥 = 1.02. The probability of a result this large or larger is only 0.0006. This probability is the P-value. Ten tasters would have an average score as large as 1.02 only 6 times in 10,000 tries if the true mean sweetness change were 0. An outcome this unlikely convinces us that the true mean is really greater than 0.

• Small P-values are evidence against 𝐻0 because they say that the observed result is unlikely to occur just by chance. Large P-value fails to give evidence against 𝐻0.

How small must a P-value be in order to persuade us?

• There is no fixed rule. But the level 0.05 (a result that would occur no more than once in 20 tries just by chance) is a common rule of thumb. A result with a small P-value, say less than 0.05, is called statistically significant.

• That’s just a way of saying that chance alone would rarely produce so extreme a result.

Hypothesis

• Hypotheses always refer to some population, not to a particular outcome.

• Alternative hypothesis could be one-sided or two-sided.

– We used a one-sided 𝐻𝑎 because colas can only lose sweetness in storage. If you do not have a specific direction firmly in mind in advance, use a two-sided alternative.

P – value

• The probability, computed assuming that 𝐻0 is true, that the observed outcome would take a value as extreme or more extreme than that actually observed is called the P-value of the test. The smaller the P-value is , the stronger is the evidence against 𝐻0 provided by the data.

Statistical Significance

• One final step to assess the evidence against 𝐻0 is comparing the P-value with a fixed value that we regard as decisive. The decisive value of P is called the significance level. We write it as α. If we choose α=0.05, we are requiring that the data give evidence against 𝐻0 so strong that it would happen no more than 5% of the time when 𝐻0 is true.

• If the P-value is as small or smaller than alpha, we say that the data are statistically significant at level α.

Type I and Type II Errors

• If we reject 𝐻0 (accept 𝐻𝑎) when in fact 𝐻0 is true, this is a Type I error.

• If we accept 𝐻0 (reject 𝐻𝑎) when in fact 𝐻𝑎is true, this is a Type II error.

EX 10.21 Too Salty?

• The mean salt content of a certain type of potato chip is supposed to be 2.0 mg. The salt content of these chips varies normally with standard deviation of 0.1 mg. From each batch produced, an inspector takes a sample of 50 chips and measures the salt content of each chip. The inspector rejects the entire batch if the sample mean salt content is significantly different from 2 mg at the 5% significance level.

• This is a test of the hypotheses 𝐻0: 𝜇 = 2 𝐻0: 𝜇 ≠ 2

To carry out the test, the company statistician

computes the z statistic 𝑧 =𝑥 −2

0.150

and rejects

𝐻0 if 𝑧 < −1.96 𝑜𝑟 𝑧 > 1.96. A Type I error is to reject 𝐻0 when in fact 𝜇 = 2.

Suppose the potato chip company decides that any batch with a mean salt content as far away from 2 as 2.05 should be rejected. So a particular Type II error is to accept 𝐻0 when in fact 𝜇 = 2.05.

Significance and Type I Error

The significance level of α of any fixed level test is the probability of a Type I error. That is, α is the probability that the test will reject the null hypothesis 𝐻0 when 𝐻0 is in fact true.

Type II Error

The probability of a Type II error for the particular alternative 𝜇 = 2.05 in EX 10.21 is the probability that the test will accept 𝐻0 when µ has this alternative value. This is the probability that the test statistic z falls between -1.96 and 1.96, calculated assuming that 𝜇 = 2.05.

Probability of Type II Error

1. Write the rule for accepting 𝐻0 in terms of 𝑥 .

−1.96 <𝑥 − 2

0.1 50 < 1.96

2 − 1.960.1

50< 𝑥 < 2 + 1.96

0.1

50

1.9723 < 𝑥 < 2.0277

2. Find the probability of accepting 𝐻0 assuming that the alternative is true.

𝑃 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝑃(1.9723 < 𝑥 < 2.0277)

= 𝑃1.9723 − 2.05

0.1 50 <𝑥 − 2.05

0.1 50 <2.0277 − 2.05

.01 50

= 𝑃 −5.49 < 𝑧 < −1.58 = 0.0571

POWER

The probability that a fixed level α significance test will reject 𝐻0 when a particular alternative value of the parameter is true is called the power of the test against that alternative.

The power of a test against any alternative is 1 minus the probability of a Type II error for that alternative.

• Calculation of P-values and calculation of power both say what would happen if we repeated the test many times. A P-value describes what would happen supposing that the null hypothesis is true. Power describes what would happen supposing that a particular alternative is true.

Increasing Power

1. Increase α. A 5% test of significance will have a greater chance of rejecting the alternative than a 1% test because the strength of evidence required for rejection is less.

2. Consider a particular alternative that is farther away from 𝜇0. Values of µ that are in 𝐻𝑎 but lie close to the hypothesized value 𝜇0 are harder to detect (lower power) than values of µ that are far from 𝜇0

3. Increase the sample size. More data will provide more information about 𝑥 so we have a better chance of distinguishing values of µ.

4. Decrease σ. This has the same effect as increasing the sample size: more information about µ. Improving the measurement process and restricting attention to a subpopulation are two common ways to decrease σ.

• Power calculations are important in planning studies. Using a significance test with low power makes it unlikely that you will find a significant effect even if the truth is far from the null hypothesis. A null hypothesis that is in fact false can become widely believed if repeated attempts to find evidence against it fail because of low power.

Tests of Significance - Mr. Song's...

Documents

Transcript of Tests of Significance - Mr. Song's...