8Tests of Hypotheses
Based on a Single Sample
8.1-8.2 Hypotheses Tests About a Population Mean
Overview of Inference
Methods for drawing conclusions about a population from sample data are called statistical inference
Methods Confidence Intervals - estimating a value of a population parameter Tests of hypotheses - assess evidence for a claim about a population
Inference is appropriate when data are produced by either a random sample or a randomized experiment
Stating hypotheses
A test of hypothesis tests a specific hypothesis using sample data to
decide on the validity of the hypothesis.
In statistics, a hypothesis is an assumption or a theory about the
characteristics of one or more variables in one or more populations.
What you want to know: Does the calibrating machine that sorts cherry
tomatoes into packs need revision?
The same question reframed statistically: Is the population mean µ for the
distribution of weights of cherry tomato packages equal to 227 g (i.e., half
a pound)?
The null hypothesis is a very specific statement about a parameter of
the population(s). It is labeled H0.
The alternative hypothesis is a more general statement about a
parameter of the population(s) that is exclusive of the null hypothesis. It
is labeled Ha.
Weight of cherry tomato packs:
H0 : µ = 227 g (µ is the average weight of the population of packs)
Ha : µ ≠ 227 g (µ is either larger or smaller)
One-sided and two-sided tests A two-tail or two-sided test of the population mean has these null
and alternative hypotheses:
H0 : µ = [a specific number] Ha : µ [a specific number]
A one-tail or one-sided test of a population mean has these null and
alternative hypotheses:
H0 : µ = [a specific number] Ha : µ < [a specific number] OR
H0 : µ = [a specific number] Ha : µ > [a specific number]
The FDA tests whether a generic drug has an absorption extent similar to the
known absorption extent of the brand-name drug it is copying. Higher or lower
absorption would both be problematic, thus we test:
H0 : µgeneric = µbrand Ha : µgeneric µbrand two-sided
Test Statistic
A test of significance is based on a statistic that estimates the parameter that appears in the hypotheses. When H0 is true, we expect the estimate to take a value near the parameter value specified in H0.
Values of the estimate far from the parameter value specified by H0 give evidence against H0.
A test statistic calculated from the sample data measures how far the data diverge from what we would expect if the null hypothesis H0 were true.
Large values of the statistic show that the data are not consistent with H0.
A test statistic calculated from the sample data measures how far the data diverge from what we would expect if the null hypothesis H0 were true.
Large values of the statistic show that the data are not consistent with H0.
z estimate - hypothesized value
standard deviation of the estimate
P-Value
The null hypothesis H0 states the claim that we are seeking evidence against. The probability that measures the strength of the evidence against a null hypothesis is called a P-value.
The probability, computed assuming H0 is true, that the statistic would take a value as extreme as or more extreme than the one actually observed is called the P-value of the test. The smaller the P-value, the stronger the evidence against H0 provided by the data.
Small P-values are evidence against H0 because they say that the observed result is unlikely to occur when H0 is true.
Large P-values fail to give convincing evidence against H0 because they say that the observed result is likely to occur by chance when H0 is true.
Statistical SignificanceThe final step in performing a significance test is to draw a conclusion about the competing claims you were testing. We will make one of two decisions based on the strength of the evidence against the null hypothesis (and in favor of the alternative hypothesis)―reject H0 or fail to reject H0.
If our sample result is too unlikely to have happened by chance assuming H0 is true, then we’ll reject H0.
Otherwise, we will fail to reject H0.
Note: A fail-to-reject H0 decision in a significance test doesn’t mean that H0 is true. For that reason, you should never “accept H0” or use language implying that you believe H0 is true.
In a nutshell, our conclusion in a significance test comes down to:
P-value small → reject H0 → conclude Ha (in context)
P-value large → fail to reject H0 → cannot conclude Ha (in context)
There is no rule for how small a P-value we should require in order to reject H0 — it’s a matter of judgment and depends on the specific circumstances. But we can compare the P-value with a fixed value that we regard as decisive, called the significance level. We write it as , the Greek letter alpha. When our P-value is less than the chosen , we say that the result is statistically significant.
If the P-value is smaller than alpha, we say that the data are statistically significant at level . The quantity is called the significance level or the level of significance.
If the P-value is smaller than alpha, we say that the data are statistically significant at level . The quantity is called the significance level or the level of significance.
When we use a fixed level of significance to draw a conclusion in a significance test,
P-value < → reject H0 → conclude Ha (in context)
P-value ≥ → fail to reject H0 → cannot conclude Ha (in context)
Statistical Significance
Tests for a Population Mean
Four Steps of Tests of Significance
1. State the null and alternative hypotheses.
2. Calculate the value of the test statistic.
3. Find the P-value for the observed data.
4. State a conclusion.
1. State the null and alternative hypotheses.
2. Calculate the value of the test statistic.
3. Find the P-value for the observed data.
4. State a conclusion.
Tests of Significance: Four StepsTests of Significance: Four Steps
We will learn the details of many tests of significance in the following chapters. The proper test statistic is determined by the hypotheses and the data collection design.
Does the packaging machine need revision?
H0 : µ = 227 g versus Ha : µ ≠ 227 g
What is the probability of drawing a random sample such
as yours if H0 is true?
245
227222
n
xz
From table A, the area under the standard
normal curve to the left of z is 0.0228.
Thus, P-value = 2*0.0228 = 4.56%.
4g5 g222 nx
2.28%2.28%
217 222 227 232 237
Sampling distribution
σ/√n = 2.5 g
µ (H0)2
,z
x
The probability of getting a random
sample average so different from
µ is so low that we reject H0.
The machine does need recalibration.
The significance level:
The significance level, α, is the largest P-value tolerated for rejecting a
true null hypothesis (how much evidence against H0 we require). This
value is decided arbitrarily before conducting the test.
If the P-value is equal to or less than α (P ≤ α), then we reject H0.
If the P-value is greater than α (P > α), then we fail to reject H0.
Does the packaging machine need revision?
Two-sided test. The P-value is 4.56%.
* If α had been set to 5%, then the P-value would be significant.
* If α had been set to 1%, then the P-value would not be significant.
Two-Sided Significance Tests and Confidence IntervalsBecause a two-sided test is symmetrical, you can also use a 1 –
confidence interval to test a two-sided hypothesis at level .
α /2 α /2
In a two-sided test,
C = 1 –
C confidence level
significance level
Sweetening colas
Cola manufacturers want to test how much the sweetness of a new cola drink is affected by storage. The sweetness loss due to storage was evaluated by 10 professional tasters (by comparing the sweetness before and after storage):
Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 −0.4 6 2.2 7 −1.3 8 1.2 9 1.1 10 2.3
Obviously, we want to test if storage results in a loss of sweetness, thus:
H0: = 0 versus Ha: > 0
This looks familiar. However, here we do not know the population parameter . The population of all cola drinkers is too large. Since this is a new cola recipe, we have no population data.
This situation is very common with real data.
When is unknown
When the sample size is large,
the sample is likely to contain
elements representative of the
whole population. Then s is a
good estimate of .
Populationdistribution
Small sampleLarge sample
But when the sample size is
small, the sample contains only
a few individuals. Then s is a
mediocre estimate of .
The sample standard deviation s provides an estimate of the population standard
deviation .
Standard deviation s – standard error s/√n
For a sample of size n, the sample standard deviation s is:
The value s/√n is called the standard error of the mean .
2)(1
1xx
ns i
x
The t distributions
Suppose that an SRS of size n is drawn from an N(µ, σ) population.
When is known, the sampling distribution is N(/√n).
When is estimated from the sample standard deviation s, the
sampling distribution follows a t distribution t(, s/√n) with degrees
of freedom n − 1.
is the one-sample t statistic.
t x s n
When n is very large, s is a very good estimate of , and the
corresponding t distributions are very close to the normal distribution.
The t distributions become wider for smaller sample sizes, reflecting
the lack of precision in estimating from s.
Standardizing the data before using t-table
t
t x s n
As with the normal distribution, the first step is to standardize the data.
Then we can use t-table to obtain the area under the curve.
s/√n
t(,s/√n)df = n − 1
t()df = n − 1
x 0
1
T-table
When σ is known, we use the normal distribution and the standardized z-value.
When σ is unknown,
we use a t distribution
with “n−1” degrees of
freedom (df).
Table shows the
z-values and t-values
corresponding to
landmark P-values/
confidence levels.
t x s n
z-table vs. t-table
Z-table gives the area to the LEFT of hundreds of z-values.
It should only be used for Normal distributions.
(…)
(…)
t-table gives the area to the RIGHT of a dozen t or z-values.
It can be used for t distributions of a given df and for the Normal distribution.
T-table also gives the middle area under a t or normal distribution comprised between the negative and positive value of t or z.
Table D
ns
xt 0
One-sided (one-tailed)
Two-sided (two-tailed)
The P-value is the probability, if H0 is true, of randomly drawing a
sample like the one obtained or more extreme, in the direction of Ha.
The P-value is calculated as the corresponding area under the curve,
one-tailed or two-tailed depending on Ha:
T-table
For df = 9 we only look into the corresponding row.
For a one-sided Ha, this is the P-value (between 0.01 and 0.02);
for a two-sided Ha, the P-value is doubled (between 0.02 and 0.04).
2.398 < t = 2.7 < 2.821thus
0.02 > upper tail p > 0.01
The calculated value of t is 2.7. We find the 2 closest t values.
Sweetening colas (continued)
Is there evidence that storage results in sweetness loss for the new cola
recipe at the 0.05 level of significance ( = 5%)?
H0: = 0 versus Ha: > 0 (one-sided test)
The critical value t = 1.833.
t > t thus the result is significant.
2.398 < t = 2.70 < 2.821 thus 0.02 > p > 0.01.
p < thus the result is significant.
The t-test has a significant p-value. We reject H0.
There is a significant loss of sweetness, on average, following storage.
Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 -0.4 6 2.2 7 -1.3 8 1.2 9 1.110 2.3___________________________Average 1.02Standard deviation 1.196Degrees of freedom n − 1 = 9
0 1.02 02.70
1.196 10
xt
s n
The one-sample t-test
As in the previous chapter, a test of hypotheses requires a few steps:
1. Stating the null and alternative hypotheses (H0 versus Ha)
2. Deciding on a one-sided or two-sided test
3. Choosing a significance level
4. Calculating t and its degrees of freedom
5. Finding the area under the curve with t-table
6. Stating the P-value and interpreting the result
The one-sample t-confidence intervalThe level C confidence interval is an interval with probability C of containing the true population parameter.
We have a data set from a population with both and unknown. We use to estimate and s to estimate using a t distribution (df n−1).
C
t*−t*
m m
m t * s n
Practical use of t : t*
C is the area between −t* and t*.
We find t* in the line of Table D for df = n−1 and confidence level C.
The margin of error m is:
x
Red wine, in moderation
Drinking red wine in moderation may protect against heart attacks. The
polyphenols it contains act on blood cholesterol, likely helping to prevent heart
attacks.
To see if moderate red wine consumption increases the average blood level of
polyphenols, a group of nine randomly selected healthy men were assigned to
drink half a bottle of red wine daily for two weeks. Their blood polyphenol levels
were assessed before and after the study, and the percent change is presented
here:
Firstly: Are the data approximately normal?
0.7 3.5 4 4.9 5.5 7 7.4 8.1 8.4
Histogram
0
1
2
3
4
2.5 5 7.5 9 More
Percentage change in polyphenol blood levels
Fre
quen
cy
There is a low
value, but overall
the data can be
considered
reasonably normal.
What is the 95% confidence interval for the average percent change?
Sample average = 5.5; s = 2.517; df = n − 1 = 8
(…)
The sampling distribution is a t distribution with n − 1 degrees of freedom.
For df = 8 and C = 95%, t* = 2.306.
The margin of error m is: m = t*s/√n = 2.306*2.517/√9 ≈ 1.93.
Therefore, the confidence interval is (5.5-1.93, 5.5+1.93).
With 95% confidence, the population average percent increase in
polyphenol blood levels of healthy men drinking half a bottle of red wine
daily is between 3.6% and 7.4%.
Type I and II errors
When we draw a conclusion from a significance test, we hope our conclusion will be correct. But sometimes it will be wrong. There are two types of mistakes we can make.
If we reject H0 when H0 is true, we have committed a Type I error.
If we fail to reject H0 when H0 is false, we have committed a Type II error.
If we reject H0 when H0 is true, we have committed a Type I error.
If we fail to reject H0 when H0 is false, we have committed a Type II error.
Truth about the population
H0 trueH0 false(Ha true)
Conclusion based on sample
Reject H0 Type I errorCorrect
conclusion
Fail to reject H0
Correct conclusion
Type II error
Type I and II errors
A Type I error is made when we reject the null hypothesis and the
null hypothesis is actually true (incorrectly reject a true H0).
The probability of making a Type I error is the significance level .
A Type II error is made when we fail to reject the null hypothesis
and the null hypothesis is false (incorrectly keep a false H0).
The probability of making a Type II error is labeled .
The power of a test is 1 − .
The Common Practice of Testing Hypotheses
1. State H0 and Ha as in a test of significance.
2. Think of the problem as a decision problem, so the probabilities of Type I and Type II errors are relevant.
3. Consider only tests in which the probability of a Type I error is no greater than .
4. Among these tests, select a test that makes the probability of a Type II error as small as possible.
1. State H0 and Ha as in a test of significance.
2. Think of the problem as a decision problem, so the probabilities of Type I and Type II errors are relevant.
3. Consider only tests in which the probability of a Type I error is no greater than .
4. Among these tests, select a test that makes the probability of a Type II error as small as possible.
Steps for Tests of Significance
1. Assumptions/Conditions
Specify variable, parameter, method of data collection, shape of population.
2. State hypotheses
Null hypothesis Ho and alternative hypothesis Ha.
3. Calculate value of the test statistic
A measure of “difference” between hypothesized value and its estimate.
4. Determine the P-value
Probability, assuming Ho true that the test statistic takes the observed value
or a more extreme value.
5. State the decision and conclusion
Interpret P-value, make decision about Ho.
8.3 Tests Concerning a Population Proportion
Sampling distribution of sample proportion The sampling distribution of a sample proportion is approximately
normal (normal approximation of a binomial distribution) when the
sample size is large enough.
p̂
Conditions for inference on pAssumptions:
1. The data used for the estimate are an SRS from the population
studied.
2. The population is at least 10 times as large as the sample used for
inference.
3. The sample size n is large enough that the sampling distribution
can be approximated with a normal distribution. Otherwise, rely on
the binomial distribution.
Large-sample confidence interval for p
Use this method when the number of
successes and the number of
failures are both at least 15.
C
Z*−Z*
m m
Confidence intervals contain the population proportion p in C% of
samples. For an SRS of size n drawn from a large population, and with
sample proportion calculated from the data, an approximate level C
confidence interval for p is:
C is the area under the standard
normal curve between −z* and z*.
nppzSEzm
mmp
)ˆ1(ˆ**
error ofmargin theis ,ˆ
p̂
Medication side effects
Arthritis is a painful, chronic inflammation of the joints.
An experiment on the side effects of pain relievers
examined arthritis patients to find the proportion of
patients who suffer side effects.
What are some side effects of ibuprofen?Serious side effects (seek medical attention immediately):
Allergic reaction (difficulty breathing, swelling, or hives)Muscle cramps, numbness, or tinglingUlcers (open sores) in the mouthRapid weight gain (fluid retention)SeizuresBlack, bloody, or tarry stoolsBlood in your urine or vomitDecreased hearing or ringing in the earsJaundice (yellowing of the skin or eyes)Abdominal cramping, indigestion, or heartburn
Less serious side effects (discuss with your doctor):Dizziness or headacheNausea, gaseousness, diarrhea, or constipationDepressionFatigue or weaknessDry mouthIrregular menstrual periods
Upper tail probability P0.25 0.2 0.15 0.1 0.05 0.03 0.02 0.01
z* 0.67 0.841 1.036 1.282 1.645 1.960 2.054 2.32650% 60% 70% 80% 90% 95% 96% 98%
Confidence level C
Let’s calculate a 90% confidence interval for the population proportion of arthritis patients who suffer some “adverse symptoms.”
What is the sample proportion ?
))1( ,( ˆ npppNp
0174.00106.0*645.1
440/)052.01(052.0*645.1
)ˆ1(ˆ*
m
m
nppzm
052.0440
23ˆ p
What is the sampling distribution for the proportion of arthritis patients with
adverse symptoms for samples of 440?
For a 90% confidence level, z* = 1.645.
Using the large sample method, we
calculate a margin of error m:
With a 90% confidence level, between 3.5% and 6.9% of arthritis patients
taking this pain medication experience some adverse symptoms.
0174.0052.0or
ˆ:forCI%90
mpp
p̂
Significance test for pThe sampling distribution for is approximately normal for large sample sizes and its shape depends solely on p and n.
Thus, we can easily test the null hypothesis:
H0: p = p0 (a given value we are testing).
n
pp
ppz
)1(
ˆ
00
0
If H0 is true, the sampling distribution is known
The likelihood of our sample proportion given the null hypothesis depends on how far from p0 our
is in units of standard deviation.
This is valid when both expected counts—expected successes np0 and
expected failures n(1 − p0)—are each 10 or larger.
p0(1 p0)
n
p0
p̂
p̂
p̂
A national survey by the National Institute for Occupational Safety and Health on
restaurant employees found that 75% said that work stress had a negative impact
on their personal lives.
You investigate a restaurant chain to see if the proportion of all their employees
negatively affected by work stress differs from the national proportion p0 = 0.75.
H0: p = p0 = 0.75 vs. Ha: p ≠ 0.75 (2 sided alternative)
In your SRS of 100 employees, you find that 68 answered “Yes” when asked,
“Does work stress have a negative impact on your personal life?”
The expected counts are 100 × 0.75 = 75 and 25.
Both are greater than 10, so we can use the z-test.
The test statistic is:
62.1
100)25.0)(75.0(
75.068.0
)1(
ˆ
00
0
npp
ppz
From Table A we find the area to the left of z = -1.62 is 0.0526.
Thus P(Z ≤ -1.62) = 0.0526. Since the alternative hypothesis is two-sided, the P-
value is the area in both tails, and therefore the p-value = 2 × 0.0526 = 0.1052.
The chain restaurant data
are not significantly different
from the national survey results
( = 0.68, z = -1.62,
p-value = 0.11).
p̂
Top Related