Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment,...

38
Chapter 11: Applications of Chi-Square 0 5 10 15 20 25 2 1 df 4 df 10 df 20 df

Transcript of Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment,...

Page 1: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Chapter 11: Applications of Chi-Square

0 5 10 15 20 252

1df

4df

10df 20df

Page 2: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Chapter Goals• Investigate two tests: multinomial

experiment, and the contingency table.

• Compare experimental results with expected results to determine

(1) Preferences

(2) Independence

(3) Homogeneity

• Enumerative data: data that is placed in categories and counted.

Page 3: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

11.1: Chi-Square Statistic

• Many problems for which the data is categorized and the results shown by way of counts.

• Results are often displayed on a chart showing the number of observations for each possible category.

Page 4: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Background:

1. Suppose there are n observations.

2. Each observation falls into a cell (or class).

3. Observed frequencies in each cell: O1, O2, O3, … , Ok.

Sum of the observed frequencies is n.

4. Expected, or theoretical, frequencies: E1, E2, E3, . . . , Ek.

Summary of notation:

nOOOO k 321

nEEEE k 321

k Categories

1st 2nd 3rd k th Total

Observed Frequency O 1 O 2 O 3 O k n

Expected Frequency E 1 E 2 E 3 E k n

Page 5: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Goal:

1. Compare the observed frequencies with the expected frequencies.

2. Decide whether the observed frequencies seem to agree or seem to disagree with the expected frequencies.

Methodology:

Use a chi-square statistic:

Small values of 2: Observed frequencies close to expected frequencies.

Large values of 2: Observed frequencies do not agree with expected frequencies.

cells all

22 )(*

EEO

Page 6: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Sampling Distribution of 2*:

When n is large and all expected frequencies are greater than or equal to 5, then 2* has approximately a chi-square distribution.

Recall:

Properties of the Chi-Square Distribution:

1. 2 is nonnegative in value; it is zero or positively valued.

2. 2 is not symmetrical; it is skewed to the right.

3. 2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom.

Page 7: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Various Chi-Square Distributions:

0 5 10 15 20 252

1df

4df

10df 20df

Page 8: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Critical values for chi-square:

1. Table 8, Appendix B.

2. Identified by degrees of freedom (df) and the area under the curve to the right of the critical value.

3. 2(df, ): critical value of a chi-square distribution with df degrees of freedom and area to the right.

4. Chi-square distribution is not symmetrical: critical values associated with right and left tails are given separately.

0 ),df(2 2

Page 9: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Example: Find 2(16, 0.05).

0 )05.0,16(2 2

05.0

Portion of Table 8

Area to the right

df 0.05

16 26.3

2(16, 0.05) = 26.3

Page 10: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Example: Find 2(10, 0.99).

0 )99.0,10(2 2

Area to the right

df 0.99

10 2.56

Portion of Table 8

2(10, 0.99) = 2.56

Page 11: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Note:

1. When df > 2, the mean value of the chi-square distribution is df.

2. The mean is located to the right of the mode (the value where the curve reaches its high point) and just to the right of the median (the value that splits the distribution, 50% on either side).

0 meandf 2modemedian

Page 12: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Note:

1. There is a separate chi-square distribution for each degree of freedom, df.

2. Assumptions for this chi-square test:

a. Information is obtained from a random sample.

b. Each observation is classified according to the categorical variable(s) involved in the test.

3. Categorical Variable: a variable that classifies or categorizes each individual into exactly one of several cells or classes; these cells or classes are all inclusive and mutually exclusive.

4. Liberal statements of null and alternative hypotheses.

Not simply statements about population parameters.

Page 13: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

11.2: Inferences Concerning Multinomial Experiments

• Examine the testing procedure for multinomial experiments.

• Do the observed frequencies match the expected frequencies?

• Hypothesis test is based on the 2* statistic.

Page 14: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Multinomial Experiment:

An experiment with the following characteristics:

1. It consists of n identical independent trials.

2. The outcome of each trial fits into exactly one of k possible cells.

3. There is a probability associated with each particular cell, and these individual probabilities remain constant during the experiment.

4. The experiment will result in a set of observed frequencies, O1, O2, . . . , Ok, where each Oi is the number of times a trial outcome falls into that particular cell.(It must be the case that O1 + O2 + + Ok = n.)

)1 that truebemust (It 21 kppp

Page 15: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Testing Procedure:

1. H0: The probabilities p1, p2, . . . , pk are correct.

Ha: At least two probabilities are incorrect.

Allow for liberal interpretation of H0 and Ha.

2. Test statistic:

3. Use a one-tailed critical region; the right-hand tail.

4. Degrees of freedom: df = k 1.

5. Expected frequencies:

6. To ensure a good approximation to the chi-square distribution: Each expected frequency should be at least 5

cells all

22 )(*

EEO

ii pnE

).5( iE

Page 16: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Example: A market research firm conducted a consumer-preference experiment to determine which of 5 new breakfast cereals was the most appealing to adults. A sample of 100 consumers tried each cereal and indicated the cereal he or she preferred. The results are given in the following table:

Is there any evidence to suggest the consumers had a preference for one cereal, or did they indicate each cereal was equally likely to be selected? Use = 0.05.

Cereal A B C D E Total

Frequency 25 17 15 22 21 100

Page 17: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Solution:

If no preference was shown, we expect the 100 consumers to be equally distributed among the 5 cereals. Thus, if no preference is given, we expect (100)(0.2) = 20 consumers in each class.

1. The Set-up:

a. Population parameter of concern: Preference for each cereal, the probability that a particular cereal is selected.

b. The null and alternative hypotheses:

H0: There was no preference shown (equally distributed).

Ha: There was a preference shown (not equally distributed).

2. The Hypothesis Test Criteria:

a. Assumptions: The 100 consumers represent a random sample.

b. Test statistic: 2* with df = k 1 = 5 1 = 4

c. Level of significance: = 0.05.

Page 18: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

3. The Sample Evidence:

a. Sample information: Table given in the statement of the problem.

b. Calculate the value of the test statistic:

2* = 3.2

O E O E (O E )2/E

25 20 5 1.25

17 20 -3 0.45

15 20 -5 1.25

22 20 2 0.20

21 20 1 0.05

100 100 0 3.20

Page 19: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

4. The Probability Distribution (Classical Approach):

a. Critical value: 2(k 1, 0.05) = 2(4, 0.05) = 9.49

b. 2* is not in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value:

Using computer: P = 0.5429. Using Table 8: P > 0.5

b. The p-value is larger than the level of significance, .

5. The Results:

a. Decision: Fail to reject H0.

b. Conclusion: At the 0.05 level of significance, there is no evidence to suggest the consumers showed a preference for any one cereal.

).4df|2.3*(P 2 P

Page 20: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Example: A sample of 200 individuals were tested for their blood type, and the results are used to test the hypothesized distribution of blood types:

At the 0.05 level of significance, is there any evidence to suggest the stated distribution is incorrect?

Blood Type A B O AB

Hypothesized Proportion 0.41 0.09 0.46 0.04

Observed Frequency 74 25 86 15

Page 21: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Solution:

1. The Set-up:

a. Population parameters of concern: The proportions: P(A), P(B), P(O), P(AB).

b. The null and alternative hypotheses:

H0: Blood type proportions are 0.41, 0.09, 0.46, 0.04

Ha: Blood type proportions are not 0.41, 0.09, 0.46, 0.04

2. The Hypothesis Test Criteria:

a. Assumptions: The 200 individuals tested form a random sample.

b. Test statistic: 2*, df = 4 1 = 3

c. Significance level: = 0.05

Page 22: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

3. The Sample Evidence:

a. Sample information: Table given in the statement of the problem.

b. Calculate the value of the test statistic:

2* = 10.02

O E O E (O E )2/E

74 82 -8 0.78

25 18 7 2.72

86 92 -6 0.39

15 8 7 6.13

200 200 0 10.02

Page 23: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

4. The Probability Distribution (Classical Approach):

a. Critical value: 2(3, 0.05) = 7.82

b. 2* is in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value:

By computer: P = 0.0184. Table 8: 0.01 < P < 0.025

b. The p-value is smaller than the level of significance, .

5. The Results:

a. Decision: Reject H0.

b. Conclusion: There is evidence to suggest the hypothesized proportions for blood types are incorrect.

)3df|02.10*(P 2 P

Page 24: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

11.3: Inference Concerning Contingency Tables

• Contingency table: an arrangement of data into a two-way classification.

• Data is sorted into cells, and the observed frequency in each cell is reported.

• Contingency table involves two factors, or variables

• Usual question: are the two variables independent or dependent?

Page 25: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

r c Contingency Table:

1. r: number of rows; c: number of columns.

2. Used to test the independence of the row factor and the column factor.

3. Degrees of freedom:

4. n = grand total.

5. Expected frequency in the ith row and the jth column:

Each Ei,j should be at least 5.

6. R1, R2, . . . , Rr and C1, C2, . . . Cc: marginal totals.

)1()1(df cr

n

CRE ji

ji

totalGrandalColumn tot totalRow

,

Page 26: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Expected Frequencies for an r c Contingency Table:

Columns

Rows 1 2 jth Column c Total

1

2

i th Row

r

Total

nCR 11

nCR 21

nCR 12

n

CRi 1

n

CR j1

n

CR ji

1R

2R

iR

1C

2C jC n

Page 27: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Example: A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution of responses is given in the table below.

Test the hypothesis “political party is independent of opinion on Proposal 129.” Use = 0.01.

Political Party

Tax Reform Democrat Republican Independent

Yes 34 11 12

No 17 12 18

Unsure 10 16 15

Page 28: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Solution:

1. The Set-up:

a. Population parameters of concern: The independence of variables “political party” and “opinion on tax reform.”

b. The null and the alternative hypotheses:

H0: Opinion on property tax reform is independent of political party.

Ha: Opinion on property tax reform is not independent of political party.

2. The Hypothesis Test Criteria:

a. Assumptions: The information was obtained from a random sample in which each individual was classified according to political party and tax reform preference.

Page 29: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

b. Test statistic:

2* with df = (r 1) (c 1) = (3 1) (3 1) = 4

c. Level of significance: = 0.01.

3. The Sample Evidence:

a. Sample information: Table given in the statement of the problem.

b. Calculate the value of the test statistic:

Table with observed frequencies, expected frequencies, and the test statistic given on the next slide.

Page 30: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Contingency table showing sample results and expected values:

Political Party

Tax Reform Democrat Republican Independent Total

Yes 34 11 12 57(23.98) (15.33) (17.69)

No 17 12 18 47(19.77) (12.64) (14.59)

Unsure 10 16 15 41(17.25) (11.03) (12.72)

Total 61 39 45 145

16.14)(

*cells all

22

E

EO

Page 31: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

4. The Probability Distribution (Classical Approach):

a. Critical value: 2(4, 0.01) = 13.3

b. 2* is in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value:

By computer: P = 0.0068. Table 8: 0.005 < P < 0.01

b. The p-value is smaller than the level of significance, .

5. The Results:

a. Decision: Reject H0.

b. Conclusion: There is evidence to suggest that opinion on tax reform and political party are not independent.

)4df|16.14*(P 2 P

Page 32: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Note: Minitab output for the previous Example.Chi-Square Test

Expected counts are printed below observed counts

Dem Rep Ind Total

1 34 11 12 57

23.98 15.33 17.69

2 17 12 18 47

19.77 12.64 14.59

3 10 16 15 41

17.25 11.03 12.72

Total 61 39 45 145

Chi-Sq = 4.188 + 1.224 + 1.830 +

0.389 + 0.033 + 0.799 +

3.046 + 2.242 + 0.407 = 14.156

DF = 4, P-Value = 0.007

Page 33: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Test for Homogeneity:

1. Another type of contingency table problem.

2. Used when one of the two variables is controlled by the experimenter so that the row (or column) totals are predetermined.

3. Hypothesis test: the distribution of proportions within rows (or columns) is the same for all rows (or columns).

4. May be thought of as a comparison of several multinomial experiments.

5. Test procedure for independence and homogeneity with contingency tables is the same.

Page 34: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Example: A pharmaceutical company conducted an experiment to determine the effectiveness of three new cough suppressants. Each cough syrup was given to 100 random subjects.

Is there any evidence to suggest the syrups act differently to suppress coughs? Use = 0.05.

Cough Suppressant

A B C Total

No relief 23 29 20 72

Some relief 60 56 50 166

Total relief 17 15 30 62

Total 100 100 100 300

Page 35: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

Solution:

1. The Set-up:

a. Population parameters of concern: The proportion of individuals who receive no relief, some relief, or

total relief for each of the three cough syrups.

b. The null and alternative hypotheses:

H0: The proportion of individuals who receive various forms of relief is the same for all three cough syrups.

Ha: The proportion of individuals who receive various forms of relief is not the same for all three cough

syrups. (In at least one group the proportions are different from the others.)

Page 36: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

2. The Hypothesis Test Criteria:

a. Assumptions: The sample information was obtained using three random samples drawn from three

separate populations in which each individual was classified according to cough suppressant and relief.

b. Test statistic:

2* with df = (r 1) (c 1) = (3 1) (3 1) = 4

c. Level of significance: = 0.05.

3. The Sample Evidence:

a. Sample information: Table given in the statement of the problem.

b. Calculate the value of the test statistic:

Page 37: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

A portion of the Minitab output:

A B C Total 1 23 29 20 72

24.00 24.00 24.00

2 60 56 50 166

55.33 55.33 55.33

3 17 15 30 62

20.67 20.67 20.67

Total 100 100 100 300

Chi-Sq = 0.042 + 1.042 + 0.667 +

0.394 + 0.008 + 0.514 +

0.651 + 1.554 + 4.215 = 9.085

DF = 4, P-Value = 0.059

Page 38: Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.

4. The Probability Distribution (Classical Approach):

a. Critical value: 2(4, 0.05) = 9.49

b. 2* does not lie in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value:

By computer: P = 0.059. Table 8: 0.05 < P < 0.010

b. The p-value is larger than the level of significance, .

5. The Results:

a. Decision: Fail to reject H0.

b. Conclusion: There is no evidence to suggest the three remedies act differently to suppress coughs.

)4df|085.9*(P 2 P