Lesson 15 - 7
description
Transcript of Lesson 15 - 7
Lesson 15 - 7
Test to See if Samples Come From Same Population
Objectives• Test a claim using the Kruskal–Wallis test
Vocabulary• Kruskal–Wallis Test -- nonparametric procedure
used to test the claim that k (3 or more) independent samples come from populations with the same distribution.
Test of Means of 3 or more groups
● Parametric test of the means of three or more groups: Compared the corresponding observations by
subtracting one mean from the other Performed a test of whether the mean is 0
● Nonparametric case for three or more groups: Combine all of the samples and rank this combined
set of data Compare the rankings for the different groups
Kruskal-Wallis Test
● Assumptions: Samples are simple random samples from three or
more populations Data can be ranked
● We would expect that the values of the samples, when combined into one large dataset, would be interspersed with each other
● Thus we expect that the average relative ratings of each sample to be about the same
Test Statistic for Kruskal–Wallis Test
A computational formula for the test statistic is
where Ri is the sum of the ranks of the ith sample R²1 is the sum of the ranks squared for the first sample R²2 is the sum of the ranks squared for the second sample, and so on n1 is the number of observations in the first sample n2 is the number of observations in the second sample, and so on N is the total number of observations (N = n1 + n2 + … + nk) k is the number of populations being compared.
12 1 ni(N + 1) ²H = -------------- --- Ri - ------------ N(N + 1) ni 2
Σ
12 R²1 R²2 R²kH = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n1 n2 nk
Test Statistic (cont)
● Large values of the test statistic H indicate that the Ri’s are different than expected
● If H is too large, then we reject the null hypothesis that the distributions are the same
● This always is a right-tailed test
Critical Value for Kruskal–Wallis Test
Small-Sample CaseWhen three populations are being compared and when the sample size from each population is 5 or less, the critical value is obtained from Table XIV in Appendix A.
Large-Sample CaseWhen four or more populations are being compared or the sample size from one population is more than 5, the critical value is χ²α with k – 1 degrees of freedom, where k is the number of populations and α is the level of significance.
Hypothesis Tests Using Kruskal–Wallis TestStep 0 Requirements: 1. The samples are independent random samples. 2. The data can be ranked.
Step 1 Box Plots: Draw side-by-side boxplots to compare the sample data from the populations. Doing so helps to visualize the differences, if any, between the medians.
Step 2 Hypotheses: (claim is made regarding distribution of three or more populations) H0: the distributions of the populations are the same H1: the distributions of the populations are not the same
Step 3 Ranks: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for each sample.
Step 4 Level of Significance: (level of significance determines the critical value) The critical value is found from Table XIV for small samples. The critical value is χ²α with k – 1 degrees of freedom (found in Table VI) for large samples.
Step 5 Compute Test Statistic:
Step 6 Critical Value Comparison: We reject the null hypothesis if the test statistic is greater than the critical value.
12 R²1 R²2 R²kH = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n1 n2 nk
Kruskal–Wallis Test Hypothesis
• In this test, the hypotheses are
H0: The distributions of all of the populations are the same
H1: The distributions of all of the populations are not the same
• This is a stronger hypothesis than in ANOVA, where only the means (and not the entire distributions) are compared
Example 1 from 15.7
S 20-29 40-49 60-69
1 54 (29) 61 (31.5) 44 (18)
2 43 (16) 41 (14) 65 (34.5)
3 38 (11.5) 44 (18) 62 (33)
4 30 (2) 47 (21) 53 (27.5)
5 61 (31.5) 33 (3) 51 (26)
6 53 (27.5) 29 (1) 49 (22.5)
7 35 (7.5) 59 (30) 49 (22.5)
8 34 (4.5) 35 (7.5) 42 (15)
9 39 (13) 34 (4.5) 35 (7.5)
10 46 (20) 74 (36) 44 (18)
11 50 (24.5) 50 (24.5) 37 (10)
12 35 (7.5) 65 (34.5) 38 (11.5)
Medians (Sums)
41(194.5)
45.5(225.5)
46.5(246)
Example 1 (cont)
12 R²1 R²2 R²kH = ------------- ----- + ----- + … + ------- - 3(N + 1) N(N + 1) n1 n2 nk
12 194.5² 225.5² 246²H = ------------- ---------- + --------- + -------- - 3(36 + 1) = 1.009 36(36 + 1) 12 12 12
Critical Value: (Large-Sample Case)χ²α with 2 (3 – 1) degrees of freedom, where 3 is the number of populations and 0.05 is the level of significance
CV= 5.991
Conclusion: Since H < CV, therefore we FTR H0 (distributions are the same)
Summary and Homework
• Summary– The Kruskal-Wallis test is a nonparametric test for
comparing the distributions of three or more populations
– This test is a comparison of the rank sums of the populations
– Critical values for small samples are given in tables– The critical values for large samples can be
approximated by a calculation with the chi-square distribution
• Homework– problems 3, 5, 7, 10 from the CD
Homework Problem 3
Sorts and Ranks
Problem 3 9 1.5 19 1.5 2
Values Ranks 11 3 3Subject
Nr X Y Z X Y Z 12 4.5 41 13 16 12 6.5 10 4.5 12 4.5 52 9 18 14 1.5 12 8 13 6.5 63 17 11 9 11 3 1.5 13 6.5 74 12 13 15 4.5 6.5 9 14 8 8
Ri = Sum of the Ranks 23.5 31.5 23 15 9 9R²i = 552.25 992.25 529 16 10 10ni = 4 4 4 N = 12 17 11 11
i = 1 i = 2 i = 3 18 12 12H = 0.875
Hcr = 5.6923 FTR
Homework Problem5
Problem 5
RanksSubject Nr Mon Tues Wed Thurs Fri
Ri = 48 226 144 194.5 207.5R²i = 2304 51076 20736 37830.25 43056.25ni = 8 8 8 8 8 N = 40
i = 1 i = 2 i = 3 i = 4 i = 5H = 18.77058
Hcr = 9.488 Reject
Homework Problem 7
Sorts and Ranks
Problem 3 9 1.5 19 1.5 2
Values Ranks 11 3 3Subject
Nr X Y Z X Y Z 12 4.5 41 13 16 12 6.5 10 4.5 12 4.5 52 9 18 14 1.5 12 8 13 6.5 63 17 11 9 11 3 1.5 13 6.5 74 12 13 15 4.5 6.5 9 14 8 8
Ri = Sum of the Ranks 23.5 31.5 23 15 9 9R²i = 552.25 992.25 529 16 10 10ni = 4 4 4 N = 12 17 11 11
i = 1 i = 2 i = 3 18 12 12H = 0.875
Hcr = 5.6923 FTR
Homework Problem 10Sort & Rank
Problem 10 456 1 1458 2 2
Values Ranks 480 3 3Subject Nr CA DN US CN DN US 485 4 4
1 578 568 506 24 21 8 491 5 52 548 530 518 17 13.5 11 492 6 63 521 571 485 12 23 4 502 7 74 555 569 480 18 22 3 506 8 85 548 563 458 16.5 20 2 513 9.5 96 530 535 456 13.5 15 1 513 9.5 107 502 561 513 7 19 9.5 518 11 118 492 513 491 6 9.5 5 521 12 12
Ri = Sum of the Ranks 114 143 43.5 530 13.5 13R²i = 12996 20449 1892.25 530 13.5 14ni = 8 8 8 N = 24 535 15 15
i = 1 i = 2 i = 3 548 16.5 16H = 13.34313 548 16.5 17
Hcr = 9.21 Reject 555 18 18561 19 19563 20 20568 21 21569 22 22571 23 23578 24 24