Chapter6.pptx

47
CHAPTER 6 Hypothesis Testing

Transcript of Chapter6.pptx

Page 1: Chapter6.pptx

CHAPTER 6

Hypothesis Testing

Page 2: Chapter6.pptx

Hypothesis Testing• A statistical hypothesis is a claim about a

population characteristic (and on occasion more than one).

• An example of a hypothesis is the claim that the population mean is some value, e.g. .

Page 3: Chapter6.pptx

Hypotheses and Test Procedures• Null Hypothesis: H0

The claim that is initially assumed to be true

• Alternative Hypothesis: H1 or Ha

The complementary assertion to H0

The new statement that we wish to test

Page 4: Chapter6.pptx

Hypothesis Test Example

We own a paint company. The old paint takes 60 minutes to dry. We want to see if a new paint will dry faster.

H0: = 60μ

H1: < 60μ

Page 5: Chapter6.pptx

A test procedure is created under the assumption of H0 and then it is determined how likely that assumption is compared to its complement HA.

The decision will be based onTest Statistic and Rejection RegionOrp-value and Significance Level

Page 6: Chapter6.pptx

Test Procedures

Test Statistic - a function of the sample data on which the decision (reject or do not reject H0 is made)

Rejection Region - set of all test statistic values for which H0 will be rejected

The basis for choosing a particular rejection region lies in an understanding of the errors that can be made.

Page 7: Chapter6.pptx

Hypotheses Test Errors• Type I Error: rejecting a true H0

• Type II Error: failing to reject a false H0

REJECT H0FAIL TO

REJECT H0

True H0TYPE I ERROR CORRECT

False H0 CORRECT TYPE II ERROR

Since we wish to control for the type I error, we set,,

The default value of (significance level) is usually taken to be 0.05.

Page 8: Chapter6.pptx

Motivating the test procedure

Example: The drying time of a certain type of paint, under fixed environmental conditions, is known to be normally distributed with mean 75 min. and standard deviation 9 min. Chemists have added a new additive that is believed to decrease drying time and have obtained a sample of 35 drying times and wish to test their assertion at significance level.Solution: Here we are interesting in estimating the following hypotheses (let be the mean of drying time),

Page 9: Chapter6.pptx

An obvious candidate for a test statistic is which is normally distributed.Thus, under ,

or, .If the test value is small enough, i.e then, we reject .

Page 10: Chapter6.pptx

What is the logic?

We assume that sample mean is a “good” estimate for μ and hence should be close to 0, which implies T.S. should be close to zero. However, if it is not, then it implies that was not a “good” hypothesis value for the true mean.

Page 11: Chapter6.pptx

Assume that from the 35 samples, then, T.S.=

thus,

So, we reject the null hypothesis at significance level.

We can also make conclusion using p-value!

Page 12: Chapter6.pptx

P-value

The p-value of a hypothesis test is the probability of observing the specific value of the test statistic, T.S., or a more extreme value, under the null hypothesis.

The direction of the extreme values is indicated by the alternative hypothesis.

Page 13: Chapter6.pptx

Computing p-value for our example

In this example values more extreme than -2.76 are as the alternative, , is indicating values less than. Thus,

p-value=

which indicates that p-value

so we reject the null hypothesis!

Page 14: Chapter6.pptx

The null hypothesis is rejected in favor of the alternative hypothesis as the probability of observing the test statistic value of -2.76 or more extreme (as indicated by Ha) is smaller than the probability of the type I error () we are willing to undertake.

Page 15: Chapter6.pptx

Large sample test for population mean (section 6.1)

Let be a random sample with (n>30) and hence is normally distributed. To test,I. vs II. vs III. vs at the significance level, first compute the test statistic,

Page 16: Chapter6.pptx

Making decision

Reject the null if,(i) (i) p-value=(ii) (ii) p-value=(iii) (iii) p-value=

Remark 6.1. If is unknown and instead s is used, one should be using Student’s-t and the relevant t-table instead of the z-table, but since the sample size is large the two distributions are equivalent.

Page 17: Chapter6.pptx

Example: A scale is to be calibrated by weighing a 1000 g test weight 60 times. The 60 scale readings have mean 1000.6 g and standard deviation 2 g. Find the P-value for testing versus .Solution:Assuming is true, from C.L.T we can say,

.We can approximate with because sample size is large. Thus,

Page 18: Chapter6.pptx

p-value=

P-value is very small, we have some strong evidence to reject the null hypothesis.

Page 19: Chapter6.pptx

If significance level is given in the problem, then compare p-value with and reject whenever p-value is less than

p-value Evidence against

No evidence

Weak evidence

Strong evidence

Very strong evidence

Making decision solely based on p-value, i.e. when significance level is not given,

Page 20: Chapter6.pptx

Example: in the previous example perform a hypotheses testing for versus at significance level .Solution:

p-value=So, we do not have any evidence to reject the null hypothesis.

Page 21: Chapter6.pptx

Tests for population proportion (section 6.3)

• Let be the number of successes in i.i.d Bernoulli trials with probability of success , then

• By C.L.T. we know under certain conditions (, ),

Page 22: Chapter6.pptx

To test,I. vs II. vs III. vs

we must assume, under the null hypothesis , the number of successes and failures is greater than 5, i.e. and , such that under and using C.L.T, we can say,

Page 23: Chapter6.pptx

The test statistic is

and the r.v. corresponding to the test statistic has a standard normal distribution under the null hypothesis assumption. Reject the null if

(i) (i) p-value=(ii) (ii) p-value=(iii) (iii) p-value=

Page 24: Chapter6.pptx

Example: For a sample of 1225 baselines, 926 gave results that were within the class C spirit leveling tolerance limits. Can we conclude that this method produces results within the tolerance limits more than 75% of the time?Solution: First, we should write the hypotheses,

Second, we should check the normality conditions under the null hypothesis,

So, we have normality under the assumption of . Thus,

Page 25: Chapter6.pptx

The observed sample proportion is,

the test statistic is,

and p-value is,

So, we do not have any evidence to reject

Page 26: Chapter6.pptx

Small sample test for population mean (section 6.4)

If the sample size is small, i.e. , then the C.L.T. is not applicable for and therefore we must assume that the individual random variables corresponding to the sample are normal random variables with mean and variance. As a result,

.

Thus, if is known then we can proceed exactly as in the case of large sample test for population mean.

Page 27: Chapter6.pptx

What if is unknown?

If is unknown, which is usually the case, we replace it by its sample estimate s. Consequently, under we have,

and then for the observed value

At the significance level, for the same hypothesis tests as before, we reject if

(i) (i) p-value=(ii) (ii) p-value=(iii) (iii) p-value=

Page 28: Chapter6.pptx

Example: Muzzle velocities of eight shells tested with a new gunpowder yield a sample mean of 2959 feet per second and a standard deviation of 39.4. The manufacturer claims that the new gunpowder produces an average velocity of no less than 3000 feet per second. Does the sample provide enough evidence to contradict the manufacturer’s claim at 0.05 significance level? (assume velocity of the new gunpowder is normally distributed)

Solution: Let be the mean velocity of the new gunpowder.μHere, we are interested in testing

H0: 3000μ

H1: < 3000μ

Because we want to see whether there is evidence to refuse the manufacturer's claim. The test statistic is,

and the rejection region is

0.0101

So, we have very strong evidence against the null hypothesis.

Page 29: Chapter6.pptx

Remark: The values contained within a two-sided C.I. are precisely those values for which the p-value of a two sided hypothesis test will be greater than .

Example: The lifetime of single cell organism is believed to be on average 257 hours. A small preliminary study was conducted to test whether the average lifetime was different when the organism was placed in a certain medium. The measurements are assumed to be normally distributed and turned out to be 253, 261, 258, 255, and 256.

Solution 1: Here we want to test v.s. with and , the teat statistic value is

Page 30: Chapter6.pptx

p-value Hence, since the p-value is large we fail to reject the null hypothesis and we conclude that the population mean is not statistically different from 257.

Solution 2: Instead of hypotheses testing if a two sided 95% confidence interval was constructed by,

it is clear that the null hypothesis value of is a plausible value and consequently we do not reject at 0.05 significance level.

Page 31: Chapter6.pptx

Large sample test for difference of two means (section 6.5)

Let and represent two independent random large samples with and with means and variances , respectively. By C.L.T we have,

How To Test the following hypotheses?!I. vs II. vs III. vs

Page 32: Chapter6.pptx

we assume that the variances are known and the test statistic is

The r.v. corresponding to the test statistic has a standard normal distribution under the null hypothesis , that . Reject the null if

(i) (i) p-value=(ii) (ii) p-value=(iii) (iii) p-value=

Page 33: Chapter6.pptx

Example: Two welding procedures are two be testing on the property of the diameter of inclusions, which are particles embedded in the weld. A sample of 544 inclusions in welds made using method X and averaged 0.37 m in diameter, with a μstandard deviation of 0.25 m. A sample of 581 inclusions in μwelds made using method Y and averaged 0.40 m in diameter, μwith a standard deviation of 0.26 m. Can you μ conclude that the mean diameter for Y exceeds that of X by more than 0.015 m.μSolution: vs The test statistics is

This is a one-tailed test with .We failed to reject the null hypothesis.

Page 34: Chapter6.pptx

Tests for the difference between two proportions (section 6.6)

Let and Y represent two independent Binomial random variables resulted from two independent i.i.d. Bernoulli trials. To test,

I. vs II. vs III. vs we first need an appropriate test statistic.

Page 35: Chapter6.pptx

We must assume that the number of successes and failures is greater than 10 for both samples.

As the null hypotheses values for and are not available we simply check that the sample successes and failures are greater than 10. By virtue of the C.L.T.

and test statistic would be constructed in the usual way.

However, under it is assumed that = which implies that the two variances of the two Bernoulli trials are equal ().

Page 36: Chapter6.pptx

Therefore we can replace and in the variance by the pooled estimate,

The test statistic is then,

and the r.v. corresponding to the test statistic has a standard normal distribution under the null hypothesis.Thus, we reject the null hypothesis whenever,(i) (i) p-value=(ii) (ii) p-value=(iii) (iii) p-value=

Page 37: Chapter6.pptx

Example: We want to compare the proportion of defective electric motors turned out by two shifts of workers. From the large number produced in a given week, 250 motors were selected from the output of shift I and 200 motors were selected from the output of shift II. The sample from shift one revealed 25 to be defective and the sample from shift II 30 faulty motors. Is it true to say the difference between the proportions of defective motors produced in two shifts is not equal to zero? Use a 0.05 significance level.

Page 38: Chapter6.pptx

Solution:Let be the proportion of defective motors produced by workers in shift I and be the proportion of defective motors produced by workers in shift II.

The goal is testing versus ,using , , and

we get and . Since, , and , and so the sample sizes are large enough to use normal approximation. Also,

=0.11Thus,

and,P-value=

So, we failed to reject the null hypothesis at 0.05 significance level. That is at 0.05 level significance level, difference between the proportions of defective motors

produced in two shifts is not equal to zero.

Page 39: Chapter6.pptx

Small sample test for the difference between two means (6.7)

In this case, since the C.L.T. is not applicable we must assume that the two random samples are normally distributed and independent.1. If the variances are known, the test statistic is,

Which has a normal distribution under the null hypothesis.

Page 40: Chapter6.pptx

2. If variances are unknown (which is usually the case),

which has a distribution under , where the degrees of freedom are given by

Page 41: Chapter6.pptx

We reject if(i) (i) p-value=(ii) (ii) p-value=(iii) (iii) p-value=Remark: If we have equality of variances () then we replace both and with

And in this case the degrees of freedom for the t distribution is .

Page 42: Chapter6.pptx

Example: The prestressing wire on each of two concrete pipes manufactured at different times was compared for torsion properties. Ten specimens randomly selected from each pipe were twisted in a laboratory apparatus until they broke the number of revolutions until complete failure was recorded. The results are as follows, with C1 and C2 denoting the two concrete pipes:

C1: 5.83, 8.66, 4.75, 3.00, 3.37, 3.63, 4.00, 4.63, 4.25, 4.13C2: 3.38, 2.81, 7.00, 1.50, 5.88, 5.25, 4.08, 7.63, 4.50, 4.88

Is there any evidence to suggest that the true mean revolutions to failure differ for the wire on the two pipes?Solution: MINITAB

Page 43: Chapter6.pptx

Test for paired data (section 6.8)

In the event that two samples are dependent, i.e. paired, such as when two different measurements are made on the same experimental unit.

Where we consider the data in the form of the pairs , and construct the one-dimensional, i.e. one-sample where for As shown earlier, .

To test,I. vs II. vs III. vs

Page 44: Chapter6.pptx

perform a one-sample hypothesis test by either a large or small sample inference using the test statistic

or

Page 45: Chapter6.pptx

Example: The two drying methods for concrete were used on seven different mixes, with each mix of concrete subjected to each drying method. The resulting strength test measurements (in psi) are given below. Is there evidence of a difference between average strengths for the two drying methods at the 10% significance level?

Solution: MINITAB

Mix Method I Method II

A 3160 3170

B 3240 3220

C 3190 3160

D 3520 3530

E 3480 3440

F 3220 3210

G 3120 3120

Page 46: Chapter6.pptx

Power of the Test

The power of a test is the probability of rejecting whenever it is false.

Power

Page 47: Chapter6.pptx

Exam 2

1. Section 2.62. Section 4.113. Sections 5.1-5.74. Sections 6.1-6.8