John Loucks St . Edward’s University
description
Transcript of John Loucks St . Edward’s University
1 1 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
John LoucksSt. Edward’sUniversity
...........
SLIDES . BY
2 2 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 12 Comparing Multiple Proportions,
Test of Independence and Goodness of Fit
Testing the Equality of Population Proportions for Three or More Populations
Goodness of Fit Test
Test of Independence
3 3 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Comparing Multiple Proportions,Test of Independence and Goodness of Fit
In this chapter we introduce three additional hypothesis-testing procedures.
The test statistic and the distribution used are based on the chi-square (c2) distribution.
In all cases, the data are categorical.
4 4 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Using the notation p1 = population proportion for population 1 p2 = population proportion for population 2 and pk = population proportion for population k
H0: p1 = p2 = . . . = pk
Ha: Not all population proportions are equal
Testing the Equality of Population Proportions
for Three or More Populations
The hypotheses for the equality of population proportions for k > 3 populations are as follows:
5 5 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
If H0 cannot be rejected, we cannot detect a difference among the k population proportions. If H0 can be rejected, we can conclude that not all k population proportions are equal.
Further analyses can be done to conclude which population proportions are significantly different from others.
Testing the Equality of Population Proportions
for Three or More Populations
6 6 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Example: Finger Lakes Homes
Finger Lakes Homes manufactures three models ofprefabricated homes, a two-story colonial, a log cabin,and an A-frame. To help in product-line planning, management would like to compare the customersatisfaction with the three home styles.
Testing the Equality of Population Proportions
for Three or More Populations
p1 = proportion likely to repurchase a Colonial for the population of Colonial owners p2 = proportion likely to repurchase a Log Cabin for the population of Log Cabin owners p3 = proportion likely to repurchase an A-Frame for the population of A-Frame owners
7 7 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
We begin by taking a sample of owners from each of the three populations.
Each sample contains categorical data indicatingwhether the respondents are likely or not likely
torepurchase the home.
Testing the Equality of Population Proportions
for Three or More Populations
8 8 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Home Owner Colonial Log A-Frame TotalLikely to Yes 100 81 83 264RepurchaseNo 35 20 41 96
Total 135 101 124 360
Observed Frequencies (sample results)
Testing the Equality of Population Proportions
for Three or More Populations
9 9 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Next, we determine the expected frequencies under the assumption H0 is correct.
If a significant difference exists between the observed and expected frequencies, H0 can be rejected.
Testing the Equality of Population Proportions
for Three or More Populations
(Row Total)(Column Total)Total Sample Sizeij
i je
Expected FrequenciesUnder the Assumption H0 is True
10 10 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Testing the Equality of Population Proportions
for Three or More Populations
Home Owner Colonial Log A-Frame TotalLikely to Yes 99.00 74.07 90.93 264RepurchaseNo 36.00 26.93 33.07 96
Total 135 101 124 360
Expected Frequencies (computed)
11 11 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
22 ( )ij ij
i j ij
f e
e
Next, compute the value of the chi-square test statistic.
Note: The test statistic has a chi-square distribution with
k – 1 degrees of freedom, provided the expected
frequency is 5 or more for each cell.
fij = observed frequency for the cell in row i and column jeij = expected frequency for the cell in row i and column j
under the assumption H0 is true
where:
Testing the Equality of Population Proportions
for Three or More Populations
12 12 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Obs. Exp. Sqd. Sqd. Diff. /
Likely to Home Freq. Freq. Diff. Diff. Exp. Freq.
Repurch. Owner fij eij (fij - eij) (fij - eij)2 (fij - eij)2/eij
Yes Colonial 97 97.50 -0.50 0.2500 0.0026
Yes Log Cab. 83 72.94 10.06 101.1142 1.3862
Yes A-Frame 80 89.56 -9.56 91.3086 1.0196
No Colonial 38 37.50 0.50 0.2500 0.0067
No Log Cab. 18 28.06 -10.06 101.1142 3.6041
No A-Frame 44 34.44 9.56 91.3086 2.6509
Total 360 360 c2 = 8.6700
Testing the Equality of Population Proportions
for Three or More Populations Computation of the Chi-Square Test Statistic.
13 13 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
where is the significance level and
there are k - 1 degrees of freedom
p-value approach:
Critical value approach:
Reject H0 if p-value < a
Rejection Rule
2 2 Reject H0 if
Testing the Equality of Population Proportions
for Three or More Populations
14 14 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Rejection Rule (using a = .05)
22
5.991 5.991
Do Not Reject H0Do Not Reject H0 Reject H0Reject H0
With = .05 and k - 1 = 3 - 1 = 2 degrees of freedom
Reject H0 if p-value < .05 or c2 > 5.991
Testing the Equality of Population Proportions
for Three or More Populations
15 15 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Conclusion Using the p-Value Approach
The p-value < a . We can reject the null hypothesis.
Because c2 = 8.670 is between 9.210 and 7.378, the area in the upper tail of the distribution is between .01 and .025.
Area in Upper Tail .10 .05 .025 .01 .005
c2 Value (df = 2) 4.605 5.991 7.378 9.210 10.597
Actual p-value is .0131
Testing the Equality of Population Proportions
for Three or More Populations
16 16 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Testing the Equality of Population Proportions
for Three or More PopulationsWe have concluded that the population proportions for the three populations of home owners are not equal.
To identify where the differences between population proportions exist, we will rely on a multiple comparisons procedure.
17 17 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Comparisons Procedure
We begin by computing the three sample proportions.
We will use a multiple comparison procedure known as the Marascuillo procedure.
1100Colonial: .741135p
281Log Cabin: .802101p
383A-Frame: .669124p
18 18 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Comparisons Procedure
Marascuillo Procedure
We compute the absolute value of the pairwise difference between sample proportions.
1 2 .741 .802 .061p p
1 3 .741 .669 .072p p
2 3 .802 .669 .133p p
Colonial and Log Cabin:
Colonial and A-Frame:
Log Cabin and A-Frame:
19 19 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Comparisons Procedure
Critical Values for the Marascuillo Pairwise ComparisonFor each pairwise comparison compute a critical value as follows:
2; 1
(1 )(1 ) j ji iij k
i j
p pp pCV
n n
For a = .05 and k = 3: c2 = 5.991
20 20 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Comparisons Procedure
i jp p CVij
Significant if
Pairwise Comparison i jp p > CVij
Colonial vs. Log Cabin
Colonial vs. A-Frame
Log Cabin vs. A-Frame
.061
.072
.133
.0923
.0971
.1034
Not Significant
Not Significant
Significant
Pairwise Comparison Tests
21 21 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Test of Independence
ei j
ij (Row Total)(Column Total)
Sample Size
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.
H0: The column variable is independent of the row variableHa: The column variable is not independent of the row variable
22 22 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Test of Independence
22
( )f e
eij ij
ijji
5. Determine the rejection rule.
Reject H0 if p -value < a or .
2 2
4. Compute the test statistic.
where is the significance level and,with n rows and m columns, there are(n - 1)(m - 1) degrees of freedom.
23 23 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Each home sold by Finger Lakes Homes can beclassified according to price and to style. FingerLakes’ manager would like to determine if theprice of the home and the style of the home areindependent variables.
Test of Independence
Example: Finger Lakes Homes (B)
24 24 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Price Colonial Log Split-Level A-Frame
The number of homes sold for each model andprice for the past two years is shown below. Forconvenience, the price of the home is listed as either$99,000 or less or more than $99,000.
> $99,000 12 14 16 3< $99,000 18 6 19 12
Test of Independence
Example: Finger Lakes Homes (B)
25 25 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Hypotheses
Test of Independence
H0: Price of the home is independent of the
style of the home that is purchasedHa: Price of the home is not independent of the
style of the home that is purchased
26 26 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Expected Frequencies
Test of Independence
Price Colonial Log Split-Level A-Frame Total
< $99K
> $99K
Total 30 20 35 15 100
12 14 16 3 45
18 6 19 12 55
27 27 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Rejection Rule
Test of Independence
2.05 7.815 With = .05 and (2 - 1)(4 - 1) = 3 d.f.,
Reject H0 if p-value < .05 or 2 > 7.815
22 2 218 16 5
16 56 11
113 6 75
6 75 ( . )
.( )
. .( . )
. .
= .1364 + 2.2727 + . . . + 2.0833 = 9.149
Test Statistic
28 28 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Conclusion Using the p-Value Approach
The p-value < a . We can reject the null hypothesis.
Because c2 = 9.145 is between 7.815 and 9.348, the area in the upper tail of the distribution is between .05 and .025.
Area in Upper Tail .10 .05 .025 .01 .005
c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838
Test of Independence
Actual p-value is .0274
29 29 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Conclusion Using the Critical Value Approach
Test of Independence
We reject, at the .05 level of significance,the assumption that the price of the home isindependent of the style of home that ispurchased.
c2 = 9.145 > 7.815
30 30 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Goodness of Fit Test:Multinomial Probability Distribution
1. State the null and alternative hypotheses.
H0: The population follows a multinomial distribution with specified probabilities for each of the k categoriesHa: The population does not follow a multinomial distribution with specified probabilities for each of the k categories
31 31 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Goodness of Fit Test:Multinomial Probability Distribution
2. Select a random sample and record the observed frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected frequency, ei , in each category by multiplying the category probability by the sample size.
32 32 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Goodness of Fit Test:Multinomial Probability Distribution
22
1
( )f ee
i i
ii
k
4. Compute the value of the test statistic.
Note: The test statistic has a chi-square distribution
with k – 1 df provided that the expected frequencies
are 5 or more for all categories.
fi = observed frequency for category iei = expected frequency for category i
k = number of categories
where:
33 33 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Goodness of Fit Test:Multinomial Probability Distribution
where is the significance level and
there are k - 1 degrees of freedom
p-value approach:
Critical value approach:
Reject H0 if p-value < a
5. Rejection rule:
2 2 Reject H0 if
34 34 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multinomial Distribution Goodness of Fit Test
Example: Finger Lakes Homes (A)
Finger Lakes Homes manufactures four models ofprefabricated homes, a two-story colonial, a log cabin,a split-level, and an A-frame. To help in productionplanning, management would like to determine ifprevious customer purchases indicate that there is apreference in the style selected.
35 35 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Split- A-Model Colonial Log Level Frame# Sold 30 20 35 15
The number of homes sold of each model for 100sales over the past two years is shown below.
Multinomial Distribution Goodness of Fit Test
Example: Finger Lakes Homes (A)
36 36 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Hypotheses
Multinomial Distribution Goodness of Fit Test
where: pC = population proportion that purchase a colonial pL = population proportion that purchase a log cabin pS = population proportion that purchase a split-level pA = population proportion that purchase an A-frame
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25
37 37 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Rejection Rule
22
7.815 7.815
Do Not Reject H0Do Not Reject H0 Reject H0Reject H0
Multinomial Distribution Goodness of Fit Test
With = .05 and k - 1 = 4 - 1 = 3 degrees of freedom
Reject H0 if p-value < .05 or c2 > 7.815.
38 38 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Expected Frequencies
Test Statistic
22 2 2 230 25
2520 25
2535 25
2515 25
25 ( ) ( ) ( ) ( )
Multinomial Distribution Goodness of Fit Test
e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
= 1 + 1 + 4 + 4 = 10
39 39 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multinomial Distribution Goodness of Fit Test
Conclusion Using the p-Value Approach
The p-value < a . We can reject the null hypothesis.
Because c2 = 10 is between 9.348 and 11.345, the area in the upper tail of the distribution is between .025 and .01.
Area in Upper Tail .10 .05 .025 .01 .005
c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838
40 40 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Conclusion Using the Critical Value Approach
Multinomial Distribution Goodness of Fit Test
We reject, at the .05 level of significance,the assumption that there is no home stylepreference.
c2 = 10 > 7.815
41 41 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Goodness of Fit Test: Normal Distribution
1. State the null and alternative hypotheses.
3. Compute the expected frequency, ei , for each interval.
(Multiply the sample size by the probability of a normal random variable being in the interval.
2. Select a random sample anda. Compute the mean and standard deviation.b. Define intervals of values so that the expected frequency is at least 5 for each interval. c. For each interval, record the observed frequencies
H0: The population has a normal distribution
Ha: The population does not have a normal distribution
42 42 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
4. Compute the value of the test statistic.
Goodness of Fit Test: Normal Distribution
22
1
( )f ee
i i
ii
k2
2
1
( )f ee
i i
ii
k
5. Reject H0 if (where is the significance level
and there are k - 3 degrees of freedom).
2 2
43 43 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Example: IQ Computers IQ Computers (one better than HP?) manufacturesand sells a general purpose microcomputer. As partof a study to evaluate sales personnel, managementwants to determine, at a .05 significance level, if theannual sales volume (number of units sold by asalesperson) follows a normal probabilitydistribution.
Goodness of Fit Test: Normal Distribution
44 44 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
A simple random sample of 30 of the salespeople
was taken and their numbers of units sold are listed
below.
Example: IQ Computers
(mean = 71, standard deviation = 18.54)
33 43 44 45 52 52 56 58 63 6464 65 66 68 70 72 73 73 74 7583 84 85 86 91 92 94 98 102 105
Goodness of Fit Test: Normal Distribution
45 45 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Hypotheses
Ha: The population of number of units sold does not have a normal distribution with
mean 71 and standard deviation 18.54.
H0: The population of number of units sold has a normal distribution with mean 71 and standard deviation 18.54.
Goodness of Fit Test: Normal Distribution
46 46 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Interval Definition
To satisfy the requirement of an expectedfrequency of at least 5 in each interval we willdivide the normal distribution into 30/5 = 6equal probability intervals.
Goodness of Fit Test: Normal Distribution
47 47 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Interval Definition
Areas = 1.00/6 = .1667
Areas = 1.00/6 = .1667
717153.0253.02
71 - .43(18.54) = 63.0371 - .43(18.54) = 63.0378.9778.9788.98 = 71 + .97(18.54)88.98 = 71 + .97(18.54)
Goodness of Fit Test: Normal Distribution
48 48 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Observed and Expected Frequencies
1-2 1 0-1 1
5 5 5 5 5 530
6 3 6 5 4 630
Less than 53.02 53.02 to 63.03 63.03 to 71.00 71.00 to 78.97 78.97 to 88.98More than 88.98
i fi ei fi - ei
Total
Goodness of Fit Test: Normal Distribution
49 49 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
2 2 2 2 2 22 (1) ( 2) (1) (0) ( 1) (1)
1.6005 5 5 5 5 5
Test Statistic
With = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.(where k = number of categories and p = numberof population parameters estimated), 2
.05 7.815
Reject H0 if p-value < .05 or 2 > 7.815.
Rejection Rule
Goodness of Fit Test: Normal Distribution
50 50 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Conclusion Using the p-Value Approach
The p-value > a . We cannot reject the null hypothesis. There is little evidence to support rejecting the assumption the population is normally distributed with = 71 and = 18.54.
Because c2 = 1.600 is between .584 and 6.251 in the Chi-Square Distribution Table, the area in the upper tailof the distribution is between .90 and .10.
Area in Upper Tail .90 .10 .05 .025 .01
c2 Value (df = 3) .584 6.251 7.815 9.348 11.345
Goodness of Fit Test: Normal Distribution
Actual p-value is .6594
51 51 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
End of Chapter 12