STA301_LEC37

53
Virtual University of Pakistan Lecture No. 37 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah

description

 

Transcript of STA301_LEC37

Page 1: STA301_LEC37

Virtual University of PakistanLecture No. 37 of the course on

Statistics and Probability

by

Miss Saleha Naghmi Habibullah

Page 2: STA301_LEC37

IN THE LAST LECTURE, YOU LEARNT

•Large Sample Confidence Intervals for p and p1-p2

•Determination of Sample Size (with reference to Interval Estimation)

•Hypothesis-Testing (An Introduction)

Page 3: STA301_LEC37

TOPICS FOR TODAY

•Hypothesis-Testing (continuation of basic concepts)•Hypothesis-Testing regarding (based on Z-statistic)

Page 4: STA301_LEC37

In the last lecture, we commenced the discussion of the concept of Hypothesis-Testing.

We introduced the concepts of the Null and Alternative hypotheses as well as the concepts of Type-I and Type-II error.

We now continue the discussion of the basic concepts of hypothesis-testing:

Page 5: STA301_LEC37

TEST-STATISTIC

A statistic (i.e. a function of the sample data not containing any parameters), which provides a basis for testing a null hypothesis, is called a test-statistic.

Page 6: STA301_LEC37

Every test-statistic has a probability distribution (i.e. sampling distribution) which gives the probability that our test-statistic will assume a value greater than or equal to a specified value OR a value less than or equal to a specified value when the null hypothesis is true.

Page 7: STA301_LEC37

ACCEPTANCE AND REJECTION REGIONS

All possible values which a test-statistic may assume can be divided into two mutually exclusive groups:one group consisting of values which appear to be consistent with the null hypothesis (i.e. values which appear to support the null hypothesis), and the other having values which lead to the rejection of the null hypothesis.

Page 8: STA301_LEC37

The first group is called the acceptance region and the second set of values is known as the rejection region for a test.

The rejection region is also called the critical region.

The value(s) that separates the critical region from the acceptance region, is called the critical value(s):

Page 9: STA301_LEC37

0 Z

Acceptance Region

Critical Region

Critical Region

Critical Value

Critical Value

Page 10: STA301_LEC37

The critical value which can be in the same units as the parameter or in the standardized units, is to be decided by the experimenter.

Page 11: STA301_LEC37

The most frequently used values of , the significance level, are 0.05 and 0.01, i.e. 5 percent and 1 percent.

By = 5%, we mean that there are about 5 chances in 100 of incorrectly rejecting a true null hypothesis.

Page 12: STA301_LEC37

RELATIONSHIP BETWEEN THE LEVEL OF SIGNIFICANCE AND THE CRITICAL REGION:The level of significance acts as a

basis for determining the CRITICAL REGION of the test.

For example, if we are testing H0: = 45 against H1: 45, our test statistic is the standard normal variable Z, and the level of significance is 5%, then the critical values are Z = +1.96

Page 13: STA301_LEC37

Corresponding to a level of significance of 5%, we have:

0 1.96 Z-1.96

2.5%2.5%

Acceptance Region

Critical Region

Critical Region

Page 14: STA301_LEC37

ONE-TAILED AND TWO-TAILED TESTS:

A test for which the entire rejection region lies in only one of the two tails – either in the right tail or in the left tail – of the sampling distribution of the test-statistic, is called a one-tailed test or one-sided test.

Page 15: STA301_LEC37

A one-tailed test is used when the alternative hypothesis H1 is formulated in the following form:

H1 : > 0 or

H1 : < 0

Page 16: STA301_LEC37

For example, if we are interested in testing a hypothesis regarding the population mean, if n is large, and we are conducting a one-tailed test, then our alternative hypothesis will be stated as

H1 : > 0 or

H1 : < 0

Page 17: STA301_LEC37

In this case, the rejection region consists of either all z-values which are greater than + z or less than – z (where is the level of significance):

Page 18: STA301_LEC37

REJECT H0 if z < –z

0–zZREJECTION

REGION

If H0 : > 0

H1 : < 0 Then (in case of large n):

Page 19: STA301_LEC37

REJECT H0 if z > z/2

0 zZ

REJECTION REGION

If H0 : < 0

H1 : > 0

Then (in case of large n):

Page 20: STA301_LEC37

If, on the other hand, the rejection region is divided equally between the two tails of the sampling distribution of the test-statistic, the test is referred to as a two-tailed test or two-sided test.

Page 21: STA301_LEC37

In this case, the alternative hypothesis H1 is set up as:

H1 : 0

meaning thereby

H1 : < 0 or > 0

Page 22: STA301_LEC37

REJECT H0 if z < –z/2 or z > z/2

0–z/2

/2 /2

z/2REJECTION

REGIONREJECTION

REGION

If H0 : = 0

H1 : 0 Then (in case of large n):

Page 23: STA301_LEC37

The location of critical region can be determined only after the alternative hypothesis H1 has been stated.

It is important to note that the one-tailed and the two-tailed tests differ only in location of the critical region, not in the size.

Page 24: STA301_LEC37

We illustrate the concept and methodology of hypothesis-testing with the help of an example :

Page 25: STA301_LEC37

EXAMPLEA steel company

manufactures and assembles desks and other office equipment at several plants in a particular country.

The weekly production of the desks of Model A at Plant-I has a mean of 200 and a standard deviation of 16.

Page 26: STA301_LEC37

Recently, due to market expansion, new production methods have been introduced and new employees hired.

Page 27: STA301_LEC37

The vice president of manufacturing would like to investigate whether there has been a change in the weekly production of the desks of Model A.

To put it another way, is the mean number of desks produced at Plant-I different from 200 at the 0.05 significance level?

Page 28: STA301_LEC37

The mean number of desks produced last year (50 weeks, because the plant was shut down 2 weeks for vacation) is 203.5.On the basis of the above result, should the vice president conclude that the there has been a change in the weekly production of the desks of Model A.

Page 29: STA301_LEC37

SOLUTION

We use the statistical hypothesis-testing procedure to investigate whether the production rate has

changed from 200 per month.

Page 30: STA301_LEC37

Step-1:Formulation of the Null and

Alternative Hypotheses:

The null hypothesis is “The population mean is 200.”

The alternative hypothesis is ‘The mean is different from 200” or “The mean is not 200.”

Page 31: STA301_LEC37

These two hypotheses are written as follows:

H0 : = 200H1 : µ 200

Page 32: STA301_LEC37

Note:

This is a two-tailed test because the alternative hypothesis does not state a direction.

In other words, it does not state whether the mean production is greater than 200 or less than 200.

The vice president only wants to find out whether the production rate is different from 200.

Page 33: STA301_LEC37

Step-2:Decision Regarding the Level of

Significance (i.e. the Probability of Committing Type-I Error):

Here, the level of significance is 0.05.

This is , the probability of committing a Type-I error (i.e. the

risk of rejecting a true null hypothesis).

Page 34: STA301_LEC37

Step-3:Test Statistic (that statistic that will

enable us to test our hypothesis):The test statistic for a large

sample mean is

Transforming the production data to standard units (z values) permits the use of the area table of the standard normal distribution.

nXz

Page 35: STA301_LEC37

Step-4: Calculations:

In this problem, we have n = 50, X = 203.5, and = 16.

Hence, the computed value of z comes out to be:

55.15016

2005.203

n

Xz

Page 36: STA301_LEC37

Step-5:Critical Region (that portion of the X-axis which compels us to reject

the null hypothesis):

Since this is a two-tailed test, half of 0.05, or 0.025, is in each tail.

The area where H0 is not rejected, located between the two critical values, is therefore 0.95.

Page 37: STA301_LEC37

Applying the inverse use of the Area Table, we find that, corresponding to = 0.05, the critical values are 1.96 and -1.96, as shown below:

Page 38: STA301_LEC37

Decision Rule for the 0.05 Significance Level

Scale to z0-1.96-1.96 +1.96

H0 is not rejected

Region of

rejection

Region of rejection

Critical Value

Critical Value

0.4750 0.4750

0.5000

025.0205.0

2

025.0205.0

2

0.5000

Page 39: STA301_LEC37

The decision rule is, therefore: Reject the null hypothesis

and accept the alternative hypothesis if the computed value of z is not between –1.96 and +1.96.

Do not reject the null hypothesis if z falls between –1.96 and + 1.96.

Page 40: STA301_LEC37

Step-6:Conclusion:

The computed value of z i.e. 1.55 lies between -1.96 and + 1.96, as shown below:

z scale0- 1.96 1.961.55

Computed value of z

Do not reject H0Reject H0 Reject H0

Page 41: STA301_LEC37

Because 1.55 lies between -1.96 and + 1.96, therefore, it does not fall in the rejection region, and hence H0 is not rejected.

In other words, we conclude that the population mean is not different from 200.

Page 42: STA301_LEC37

So, we would report to the vice president of manufacturing that the sample evidence does not show that the production rate at Plant-I has changed from 200 per week.

The difference of 3.5 units between the historical weekly production rate and the production rate of last year can reasonably be attributed to chance.

Page 43: STA301_LEC37

The above example pertained to a two-tailed test.

Let us now consider a few examples of one-tailed tests:

Page 44: STA301_LEC37

EXAMPLEA random sample of 100

workers with children in day care show a mean day-care cost of Rs.2650 and a standard deviation of Rs.500.

Verify the department’s claim that the mean exceeds Rs.2500 at the 0.05 level with this information.

Page 45: STA301_LEC37

SOLUTION

In this problem, we regard the department’s claim, that the mean exceeds Rs.2500, as H1,

and regard the negation of this claim as H0.

Thus, we have

Page 46: STA301_LEC37

i) H0 : < 2500H1 : > 2500 (exceeds 2500)

(Important Note: We should always regard that hypothesis as the null hypothesis which contains the equal sign.)

Page 47: STA301_LEC37

ii) We are given the significance level at = 0.05.

iii) The test-statistic, under H0 is

which is approximately normal as n = 100 is large enough to make use of the central limit theorem.

,nS

XZ 0

Page 48: STA301_LEC37

0 Z0.05

=1.645

ZREJECTION

REGION

0.05

iv) The rejection region is Z > Z0.05 = 1.645

Page 49: STA301_LEC37

v) Computing the value of Z from sample information, we find

350

15010050025002650z

Page 50: STA301_LEC37

vi) Conclusion:

Since the calculated value z = 3 is greater than 1.645, hence it falls in the rejection region, and, therefore, we reject H0, and may conclude that the department’s claim is supported by the sample evidence.

Page 51: STA301_LEC37

An Interesting and Important Point:

For = 0.01, Z = 2.33.As our computed value of Z i.e. 3 is even greater than 2.33, the computed value of X is highly significant. (With only 1% chance of being wrong, the department’s claim was correct).

Page 52: STA301_LEC37

IN TODAY’S LECTURE, YOU LEARNT

•Hypothesis-Testing (continuation of basic concepts)•Hypothesis-Testing regarding (based on Z-statistic)

Page 53: STA301_LEC37

IN THE NEXT LECTURE, YOU WILL LEARN

• Hypothesis-Testing regarding 1 - 2

(based on Z-statistic)• Hypothesis Testing regarding p (based on Z-statistic)• Hypothesis Testing Regarding p1-p2

(based on Z-statistic)• p-value• Relationship Between Confidence Interval and Tests of Hypothesis