Hypothesis testing – mean differences between populations Part 2.

28
Hypothesis testing – mean differences between populations Part 2

Transcript of Hypothesis testing – mean differences between populations Part 2.

Page 1: Hypothesis testing – mean differences between populations Part 2.

Hypothesis testing – mean differences between populations

Part 2

Page 2: Hypothesis testing – mean differences between populations Part 2.

Hypothesis Tests for Differences Between Two Population Means

Note that most research is not designed in such a way that the mean of one sample is compared to the population mean. In practice, most experimenters like to use control groups in their research. Often control groups are used as substitutes for population values.

Mean weight of men who exercise regularly and mean weight of men who never exercise.

Instead of comparing the mean of the research sample to the mean of the population, we could have compared the mean of the research sample to the mean of the control sample. This is normally done by computing the difference between the two means and then comparing this difference to the mean of the sampling distribution of differences between means.

Page 3: Hypothesis testing – mean differences between populations Part 2.

Difference between two Population Means

INDEPENDENT SAMPLES• If the selection of sample from one population does not affect the

selection of the second sample from the second population.

Suppose we want to estimate the difference between the mean income of male and female lecturers. We draw two samples, one from the population of male lecturers and another from the female lecturers.

DEPENDENT SAMPLES•To estimate the difference between the mean weight of all participants before and after a weight loss programme. To investigate the weight before and after the programme, it must involve the same respondents.

The two samples are dependent.

Page 4: Hypothesis testing – mean differences between populations Part 2.

• Researchers want to see if men have a higher blood pressure than women do. A study is planned in which the blood pressures of 50 men and 50 women will be measured.– Ho: m f H1: m > f

– Alternatively, we can present as Ho: m - f 0 H1: m - f > 0

• An airport official wants to assess if the flights from one airline (Airline 1) are less delayed than flights from another airline (Airline 2).– H0: 1 2 (1 - 2 0) H1: 1 < 2 (1 - 2 <0)

Page 5: Hypothesis testing – mean differences between populations Part 2.

Hypothesis Tests for Mean Differences Between Two Population and variances are known

• Apply the same principles as for hypothesis testing for a single population mean.

• To test H0: X=Y (equivalent to H0: X-Y = 0), we use the fact that:

and thus the test statistic

2 2X Y

X YX Y

x y ~ N ,n n

X Y

2 2X Y

X Y

(x y) ( )z

n n

Page 6: Hypothesis testing – mean differences between populations Part 2.

Hypothesis Tests for Two Population Means: A Summary

Lower-tail one-sided test:

H0: μx μy

H1: μx < μy

i.e.,

H0: μx – μy 0H1: μx – μy < 0

Upper-tail one-sidedtest:

H0: μx ≤ μy

H1: μx > μy

i.e.,

H0: μx – μy ≤ 0H1: μx – μy > 0

Two-tail (two-sided) test:

H0: μx = μy

H1: μx ≠ μy

i.e.,

H0: μx – μy = 0H1: μx – μy ≠ 0

Two Population Means, Independent Samples

Page 7: Hypothesis testing – mean differences between populations Part 2.

Decision Rules

Two Population Means, Independent Samples, Variances Known

Lower-tail test:

H0: μx – μy 0H1: μx – μy < 0

Upper-tail test:

H0: μx – μy ≤ 0H1: μx – μy > 0

Two-tail test:

H0: μx – μy = 0H1: μx – μy ≠ 0

a a/2 a/2a

-za -za/2za za/2

Reject H0 if z < -za Reject H0 if z > za Reject H0 if z < -za/2

or z > za/2

Page 8: Hypothesis testing – mean differences between populations Part 2.

Test statistic when σ12 and σ2

2 unknown

• If σ12 and σ2

2 are unknown, replace with s12 , s2

2 (sample variances).

If the samples of n1 and n2 are large (each n100), besides t test statistic, Z test statistic can be used too.

If sample sizes are small, but at least 30 and above for each sample, and we are sampling from populations with normal distributions, we use t-test at df= (n1-1)+(n2-1).

Page 9: Hypothesis testing – mean differences between populations Part 2.

Decision Rules

Lower-tail test:

H0: μx – μy 0H1: μx – μy < 0

Upper-tail test:

H0: μx – μy ≤ 0H1: μx – μy > 0

Two-tail test:

H0: μx – μy = 0H1: μx – μy ≠ 0

a a/2 a/2a

-ta -ta/2ta ta/2

Reject H0 if t < -t (nx+ny – 2), a

Reject H0 if t > t (nx+ny, )a

Reject H0 if t < -t (nx+ny – 2), /2a or t > t ((nx+ny – 2), /2a

Two Population Means, Independent Samples, Variances Unknown

Page 10: Hypothesis testing – mean differences between populations Part 2.

Decision making and conclusion for mean difference

• H0: mean nose lengths of men and women are the same.

• H1: men have a longer mean nose length than women.

If the p-value is 0.225 (this is determined based on test statistic value. H1 show one-tailed (right) test).

If the significance level is set 0.05, we do not reject the hull hypothesis. The conclusion is there is not enough evidence to say that that the populations of men and women have statistically significant mean difference in nose lengths. The observed mean difference in the sample is likely due to chance (sampling error when we collect data from a particular sample)

• If the p-value is assumed at 0.01, what is your decision?

Page 11: Hypothesis testing – mean differences between populations Part 2.

Decision making using p-value

• if p-value < significance level, reject null hypothesis.

• if p-value > significance level, do not reject null hypothesis

Page 12: Hypothesis testing – mean differences between populations Part 2.

• A test was given to two classes of 40 and 50 students respectively. In the first class, the mean mark was 74 with a standard deviation of 8. In the second class the mean mark was 78 with a standard deviation of 7. Is there a significant difference between the performance of the two classes at the 5% level of significance?

• Ho : 1 = 2

• H1 : 1 2 (there is a significant difference between the population means)

• Note population variances are unknown, use t test statistic.• t test = 74-78/[ (64/40) + (49/50)] • = -4/2.58 • = -4/1.606 = -2.490.

Page 13: Hypothesis testing – mean differences between populations Part 2.

Using the critical value approach

reject H0 reject H0

-1.96 1.96 t

• There are 2 rejection regions for two-tailed test. Significant level has to be divided by 2.

• If =0.05, the rejection area for each side becomes 0.025 (0.05/2). At 0.025, the critical values get from t distribution table are -1.96 and 1.96.

• The calculated t test statistic lies to the left of -1.96, so it in the rejection region. We reject the hull hypothesis.

• Hence we can reject H0 and conclude that there is a significant difference in the performance of the two

classes

Page 14: Hypothesis testing – mean differences between populations Part 2.

P-value approach

• Alternative hypothesis H1 indicates that it is a two-tailed test. • Given t-test statistic -2.490, look for P(t>2.49) and P(t<-2.49).

Summing the two probability values, you get the p-value.• The P(t>2.49) at 88 d.f. will lie between 0.01 to 0.005. Assume

it is 0.006P(t>2.49) +P(t<-2.49)= 0.006+0.006=0.012

• Significance level =0.05, P-value=0.012. Thus, P-value < , we reject the null hypothesis.

• Conclusion - The results are statistically significant and so there is mean difference between performance of the two classes in the populations.

Page 15: Hypothesis testing – mean differences between populations Part 2.

• For the variable “Time spent watching TV in Typical Day,” here are results of a two-sample t-procedure that compares a random sample of men and women at a college.

• Which of the following is the correct conclusion about these results using a 5% significance level?A. The mean TV watching times of men and women at the college are

equal.B. There is a statistically significant difference between the mean TV

watching times of men and women at the college.C. There is not a statistically significant difference between the mean

TV watching times of men and women at the college. D. There is not enough information to judge statistical significance here.

Sex N Mean StDev SE Mean f 116 1.95 1.51 0.14 m 59 2.37 1.87 0.24 95% CI for mu (f) - mu (m): ( -0.97, 0.14) T-Test mu (f) = mu (m) (vs not =): T = -1.49 P = 0.14 DF = 97

Page 16: Hypothesis testing – mean differences between populations Part 2.

Assumption of variance equality • So far we have not said anything about variation in different

populations . They could be equal or not equal.

• The degree of freedom in t-test statistic and standard error of sample mean are affected by the assumption of variance equality. The d.f. calculation for variances not equal is a bit complicated to determine.

• For this reason, we refer to the output in SPSS.• Process:

(i) test equality of variances using the F-test. In SPSS, it is stated under Levene test.

(ii) once variance equality is known, choose the appropriate t- test for means difference.

Page 17: Hypothesis testing – mean differences between populations Part 2.

• A large car insurance company is conducting a study to see if male and female drivers have the same number of accidents, on average, or if male drivers (who tend to be thought of as more aggressive drivers) have more. Data on the number of accidents in the past 5 years is collected for randomly selected drivers who are insured by this company. An analysis of the results produced the following output.

Group Statistics

32 2.19 1.712 .3027

30 1.23 1.382 .2523

Sex of insuredMale

Female

Number of accidentspast 5 years

N MeanStd.

DeviationStd. Error

Mean

Independent Samples Test

2.958 .091 2.405 60 .0193

2.422 58.733 .0186

Equal variancesassumed

Equal variancesnot assumed

Number of accidentspast 5 years

F Sig.

Levene's Testfor Equality of

Variances

t dfSig.

(2-tailed)

t-test for Equality of Means

Can we reject 2m =2

f?

Page 18: Hypothesis testing – mean differences between populations Part 2.

A. State the null and alternative hypotheses.B. Choose the appropriate t-test to use.C. State the p-value for the t-test statistic.D. Are the results statistically significant, if the significance level

=0.05? Why?E. What is your conclusion for the results?

Page 19: Hypothesis testing – mean differences between populations Part 2.

• Two groups of 10 students each took an examination to see whether they have understood the course materials which were taught. Do the two groups differ in their understanding of the course materials at =0.10, based on the SPSS output below?

• Must state the null hypothesis and alternative hypothesis in your solution though it’s not mentioned in the question.

Levene’s Test for

equality of variances

t-test for equality of means

F Sig. t df Sig. (2-tailed)

Equal variances assumed

1.06 0.932 -1.84 18 0.082

Equal variances not assumed

-1.84 17.98 0.082

Page 20: Hypothesis testing – mean differences between populations Part 2.

Using SPSS for hypothesis testing

A. Testing mean value• One population mean: Analyse compare means one sample

t test drag a quantitative variable into test variable’s box and specify the test value, e.g 35 OK

• Two population means difference: • Independent samples: Analyse compare means

independent samples t test drag one numeric variable into the box of test variable and drag a qualitative variable with two categories responses only into the grouping variable

• Dependent samples (before-after): Analyse compare means pair samples t test drag a numeric variable into variable 1 dialog box and repeat for the second variable.

Page 21: Hypothesis testing – mean differences between populations Part 2.
Page 22: Hypothesis testing – mean differences between populations Part 2.

In SPSS, it provides analysis for equal variance assumption. Note that variance for population is usually unknown to researcher. It would be good if we can test out the validity of equal assumption as it will affect degree of freedom and standard error of means difference

Page 23: Hypothesis testing – mean differences between populations Part 2.
Page 24: Hypothesis testing – mean differences between populations Part 2.

One way ANOVA – mean differences for more than 2 groups

• Do graduates of undergraduate business programs with different majors tend to earn disparate average starting salaries? Consider the data given in the table below.

Accounting Marketing Finance Management$37,220 $28,620 $29,870 $28,600$30,950 $27,750 $31,700 $27,450$32,630 $27,650 $31,740 $26,410$31,350 $27,640 $32,750 $27,340$29,410 $28,340 $30,550 $27,300$37,330 $29,250$35,700 $28,890

$30,150

Can you reject at the 10% significance level that the mean starting salary is the same for each of the given business majors?

Page 25: Hypothesis testing – mean differences between populations Part 2.

Source of variation SS df MS F p-value Between groups 140927283.143 3 46975761.048 12.677 0.0001 Within groups 77820292.857 21 3705728.231 Total variation 218747576.000 24

Accounting Marketing Finance Management

Sample sizes 7 5 8 5

Sample means 33512.857 28000.000 30612.500 27420.000

Sample standard deviations 3213.413 451.276 1342.458 780.096

Given the p-value (0.0001) is less than the significant level (0.1 or 10%), we have evidence to reject the null hypothesis that the means starting salary are not all equal for each major.

H0: µa = µm = µf = µman H0: At least two population means are unequal

Page 26: Hypothesis testing – mean differences between populations Part 2.

State the null hypothesis and alternative hypothesis.What is your conclusion for this test?

Page 27: Hypothesis testing – mean differences between populations Part 2.
Page 28: Hypothesis testing – mean differences between populations Part 2.

Command to get ANOVA in SPSS

• Analyse compare means one-way ANOVA a dialog is open. Drag a quantitative variable into the box of dependent variable and under factor drag a categorical variable with at least 3 responses. OK.