Tests of Significance - Mr. Song's...
Transcript of Tests of Significance - Mr. Song's...
Tests of Significance
And Using the Significance Tests
Example: A Great Free-Throw Shooter
The one with Zach, Cecilee and a basketball.
Zach claims that he makes 80% of his free throws. To test his claim, Cecilee asks him to shoot 20 free throws. Zach makes only 8 out of the 20. Cecilee says, βSomeone who makes 80% of his free throws would almost never make only 8 out of 20. So I donβt believe your claim.β
Cecilee thought about what would happen if Zachβs claim were true and he repeated the sample of 20 free throws many times β Zach would almost never make as few as 8. This outcome is so unlikely that it gives strong evidence that Zachβs claim is not true.
Cecilee even finds the probability that Zach would make 8 out of 20 free throws if he really makes 80% in the long run. This probability is 0.0001. The small probability convinces her that Zachβs claim is fake.
Basic Idea of Significance Test
β’ An outcome that would rarely happen if a claim were true is good evidence that the claim is not true.
Example 10.9 Sweetening cola
β’ Diet cola uses artificial sweeteners to avoid sugar. These sweeteners gradually lose their sweetness over time. Trained tasters score the cola on a βsweetness scaleβ of 1 to 10. The cola is then stored for a month at high temperature to imitate the effect four monthsβ storage at room temperature. This is a matched pair experiment. Our data are the differences (before minus after storage) in the tastersβ scores.
Here are the sweetness losses for a new cola as measured by 10 trained tasters.
The average sweetness loss for our cola is given by the sample mean, π₯ = 1.02.
Does the sample result π₯ = 1.02 reflect a real loss of sweetness?
Or
Could we easily get the outcome π₯ = 1.02 just by chance?
The significance test starts with a careful statement of these alternatives.
2.0 0.4 0.7 2.0 -0.4 2.2 -1.3 1.2 1.1 2.3
Significance Test
1. Draw conclusion about some parameter of the population
2. State the Null and alternative hypotheses
3. Calculate statistics like π₯ and decide if it is far from parameter
4. Find the P-value
5. Interpretation
Step 1: Parameter
First, we draw conclusions about some parameter. In this case, the population mean Β΅.
The mean Β΅ is the average loss in sweetness that a very large number of tasters would detect in the cola. Our 10 tasters are a sample from this population.
Step 2: State the Null
β’ The null hypothesis says that there is no effect or no change in the population. If the null hypothesis is true, the sample result is just chance at work.
β’ The null hypothesis says that the cola does not lose sweetness (no change).
β’ π»0: π = 0
Step 2: Alternative Hypothesis
β’ The alternative to βno changeβ described by the null.
β’ The alternative hypothesis says that the cola does lose sweetness.
β’ π»π: π > 0
β’ Suppose the null hypothesis is true (π = 0). Is the sample outcome π₯ = 1.02 surprisingly large under that supposition? If it is, thatβs evidence against π»0 and in favor of π»π.
β’ Suppose further that we know that individual tastersβ scores vary according to a normal distribution and that the standard deviation is π = 1.
Step 3: Calculate Statistics
β’ The sampling distribution of π₯ from 10 tasters is then normal with mean π = 0 and standard
deviation π
π=
1
10= 0.316.
β’ The taste test for our cola produced π₯ = 1.02. That is way out on the normal curve, so far out that an observed value this large would almost never occur just by chance if the true Β΅ were 0.
Figure 10.10
Step 4: P β value
β’ We measure the strength of the evidence against π»0 by the probability under the normal curve to the right of the observed π₯ .
β’ This probability is called the P β value. It is the probability of a result at least as far out as the result we actually got. The lower this probability, the more surprising our result, and the stronger the evidence against the null hypothesis.
Letβs say there is a new cola. Our 10 tasters gave π₯ = 0.3. the probability to the right of 0.3 (the P-value) is 0.17. That is 17% of all samples would give a mean score as large or larger than 0.3 just by chance when the true population mean is 0. An outcome this likely to occur just by chance is not good evidence against the null hypothesis.
Our cola showed a larger sweetness loss, π₯ = 1.02. The probability of a result this large or larger is only 0.0006. This probability is the P-value. Ten tasters would have an average score as large as 1.02 only 6 times in 10,000 tries if the true mean sweetness change were 0. An outcome this unlikely convinces us that the true mean is really greater than 0.
β’ Small P-values are evidence against π»0 because they say that the observed result is unlikely to occur just by chance. Large P-value fails to give evidence against π»0.
How small must a P-value be in order to persuade us?
β’ There is no fixed rule. But the level 0.05 (a result that would occur no more than once in 20 tries just by chance) is a common rule of thumb. A result with a small P-value, say less than 0.05, is called statistically significant.
β’ Thatβs just a way of saying that chance alone would rarely produce so extreme a result.
Hypothesis
β’ Hypotheses always refer to some population, not to a particular outcome.
β’ Alternative hypothesis could be one-sided or two-sided.
β We used a one-sided π»π because colas can only lose sweetness in storage. If you do not have a specific direction firmly in mind in advance, use a two-sided alternative.
P β value
β’ The probability, computed assuming that π»0 is true, that the observed outcome would take a value as extreme or more extreme than that actually observed is called the P-value of the test. The smaller the P-value is , the stronger is the evidence against π»0 provided by the data.
Statistical Significance
β’ One final step to assess the evidence against π»0 is comparing the P-value with a fixed value that we regard as decisive. The decisive value of P is called the significance level. We write it as Ξ±. If we choose Ξ±=0.05, we are requiring that the data give evidence against π»0 so strong that it would happen no more than 5% of the time when π»0 is true.
β’ If the P-value is as small or smaller than alpha, we say that the data are statistically significant at level Ξ±.
Type I and Type II Errors
β’ If we reject π»0 (accept π»π) when in fact π»0 is true, this is a Type I error.
β’ If we accept π»0 (reject π»π) when in fact π»πis true, this is a Type II error.
EX 10.21 Too Salty?
β’ The mean salt content of a certain type of potato chip is supposed to be 2.0 mg. The salt content of these chips varies normally with standard deviation of 0.1 mg. From each batch produced, an inspector takes a sample of 50 chips and measures the salt content of each chip. The inspector rejects the entire batch if the sample mean salt content is significantly different from 2 mg at the 5% significance level.
β’ This is a test of the hypotheses π»0: π = 2 π»0: π β 2
To carry out the test, the company statistician
computes the z statistic π§ =π₯ β2
0.150
and rejects
π»0 if π§ < β1.96 ππ π§ > 1.96. A Type I error is to reject π»0 when in fact π = 2.
Suppose the potato chip company decides that any batch with a mean salt content as far away from 2 as 2.05 should be rejected. So a particular Type II error is to accept π»0 when in fact π = 2.05.
Significance and Type I Error
The significance level of Ξ± of any fixed level test is the probability of a Type I error. That is, Ξ± is the probability that the test will reject the null hypothesis π»0 when π»0 is in fact true.
Type II Error
The probability of a Type II error for the particular alternative π = 2.05 in EX 10.21 is the probability that the test will accept π»0 when Β΅ has this alternative value. This is the probability that the test statistic z falls between -1.96 and 1.96, calculated assuming that π = 2.05.
Probability of Type II Error
1. Write the rule for accepting π»0 in terms of π₯ .
β1.96 <π₯ β 2
0.1 50 < 1.96
2 β 1.960.1
50< π₯ < 2 + 1.96
0.1
50
1.9723 < π₯ < 2.0277
2. Find the probability of accepting π»0 assuming that the alternative is true.
π ππ¦ππ πΌπΌ πΈππππ = π(1.9723 < π₯ < 2.0277)
= π1.9723 β 2.05
0.1 50 <π₯ β 2.05
0.1 50 <2.0277 β 2.05
.01 50
= π β5.49 < π§ < β1.58 = 0.0571
POWER
The probability that a fixed level Ξ± significance test will reject π»0 when a particular alternative value of the parameter is true is called the power of the test against that alternative.
The power of a test against any alternative is 1 minus the probability of a Type II error for that alternative.
β’ Calculation of P-values and calculation of power both say what would happen if we repeated the test many times. A P-value describes what would happen supposing that the null hypothesis is true. Power describes what would happen supposing that a particular alternative is true.
Increasing Power
1. Increase Ξ±. A 5% test of significance will have a greater chance of rejecting the alternative than a 1% test because the strength of evidence required for rejection is less.
2. Consider a particular alternative that is farther away from π0. Values of Β΅ that are in π»π but lie close to the hypothesized value π0 are harder to detect (lower power) than values of Β΅ that are far from π0
3. Increase the sample size. More data will provide more information about π₯ so we have a better chance of distinguishing values of Β΅.
4. Decrease Ο. This has the same effect as increasing the sample size: more information about Β΅. Improving the measurement process and restricting attention to a subpopulation are two common ways to decrease Ο.
β’ Power calculations are important in planning studies. Using a significance test with low power makes it unlikely that you will find a significant effect even if the truth is far from the null hypothesis. A null hypothesis that is in fact false can become widely believed if repeated attempts to find evidence against it fail because of low power.