© 2011 Pearson Education, Inc
© 2011 Pearson Education, Inc
Statistics for Business and Economics
Chapter 9
Categorical Data Analysis
© 2011 Pearson Education, Inc
Contents
9.1 Categorical Data and the Multinomial Experiment
9.2 Testing Category Probabilities: One-Way Table
9.3 Testing Category Probabilities: Two-Way Contingency Table
9.4 A Word of Caution about Chi-Square Tests
© 2011 Pearson Education, Inc
Learning Objectives
1. Discuss qualitative (i.e., categorical) data with more than two outcomes
2. Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable–called a one-way analysis
3. Present a chi-square hypothesis test for relating two qualitative variables–called a two-way analysis
© 2011 Pearson Education, Inc
9.1
Categorical Data andMultinomial Experiment
© 2011 Pearson Education, Inc
Qualitative Data
• Qualitative random variables yield responses that can be classified
– Example: gender (male, female)
• Qualitative data that fall in more than two categories often result from a multinomial experiment
© 2011 Pearson Education, Inc
Properties of theMultinomial Experiment
1. The experiment consists of n identical trials.
2. There are k possible outcomes to each trial. These outcomes are called classes, categories, or cells.
3. The probabilities of the k outcomes, denoted by p1, p2,…, pk, remain the same from trial to trial,wherep1 + p2 + … + pk = 1.
4. The trials are independent.
5. The random variables of interest are the cell counts, n1, n2, …, nk, of the number of observations that fall in each of the k classes.
© 2011 Pearson Education, Inc
9.2
Testing Category Probabilities: One-Way Table
© 2011 Pearson Education, Inc
Multinomial Experiment
In this section, we consider a multinomial experiment with k outcomes that correspond to categories of a single qualitative variable. The results of such an experiment are summarized in a one-way table. The term one-way is used because only one variable is classified. Typically, we want to make inferences about the true proportions that occur in the k categories based on the sample information in the one-way table.
© 2011 Pearson Education, Inc
Chi-Square (2) Test for k Proportions
• Tests equality (=) of proportions only– Example: p1 = .2, p2=.3, p3 = .5
• One variable with several levels
• Uses one-way contingency table
© 2011 Pearson Education, Inc
One-Way Contingency Table
Shows number of observations in k independent groups (outcomes or variable levels)
Outcomes (k = 3)
Number of responses
Candidate
Tom Bill Mary Total
35 20 45 100
© 2011 Pearson Education, Inc
A Test of a Hypothesis about Multinomial Probabilities: One-Way Table
H0: p1 = p1,0, p2 = p2,0, …, pk = pk,0
where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities.
Ha: At least one of the multinomial probabilities does not equal its hypothesized value.
Test statistic: 2 = =
ni −Ei⎡⎣ ⎤⎦
2
Ei
∑
© 2011 Pearson Education, Inc
A Test of a Hypothesis about Multinomial Probabilities: One-Way Table
where Ei = npi,0 is the expected cell count–that is, the expected number of outcomes of type i assuming that H0 is true. The total sample size is n.
where has (k – 1) df.
Rejection region: 2 > α2
α2
© 2011 Pearson Education, Inc
Conditions Required for a Valid Test: One-way Table
1. A multinomial experiment has been conducted. This is generally satisfied by taking a random sample from the population of interest.
2. The sample size n is large. This is satisfied if for every cell, the expected cell count Ei will be equal to 5 or more.
© 2011 Pearson Education, Inc
2 Test Basic Idea
1. Compares observed count to expected count assuming null hypothesis is true
2. Closer observed count is to expected count, the more likely the H0 is true
• Measured by squared difference relative to expected count— Reject large values
© 2011 Pearson Education, Inc
Finding Critical Value Example
What is the critical 2 value if k = 3, and α =.05?
20
Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991
2 Table (Portion)
If ni = E(ni), 2 = 0.
Do not reject H0
df = k - 1 = 2
5.991
Reject H0
α = .05
© 2011 Pearson Education, Inc
As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions?
2 Test for k Proportions Example
© 2011 Pearson Education, Inc
• H0:
• Ha:
• α =
• n1 = n2 = n3 =
• Critical Value(s):
p1 = p2 = p3 = 1/3
At least 1 is different
.05
63 45 72
α = .05
20
Reject H0
5.991
2 Test for k Proportions Solution
© 2011 Pearson Education, Inc
( )( ) ( ) ( ) ( )
,0
1 2 3 180 1 3 60
i iE n np
E n E n E n
=
= = = =
2 Test for k Proportions Solution
( )( )
[ ] [ ] [ ]
2
2
all cells
2 2 263 60 45 60 72 60
6.360 60 60
i i
i
n E n
E nχ
⎡ ⎤−⎣ ⎦=
− − −= + + =
∑
© 2011 Pearson Education, Inc
• H0:
• Ha:
• α =
• n1 = n2 = n3 =
• Critical Value(s):
Test Statistic:
Decision:
Conclusion:
p1 = p2 = p3 = 1/3
At least 1 is different
.05
63 45 72
α = .05
20
Reject H0
5.991
2 Test for k Proportions Solution
2 = 6.3
Reject at α = .05
There is evidence of a difference in proportions
© 2011 Pearson Education, Inc
9.3
Testing Category Probabilities: Two-Way (Contingency) Table
© 2011 Pearson Education, Inc
2 Test of Independence
• Shows if a relationship exists between two qualitative variables
– One sample is drawn– Does not show causality
• Uses two-way contingency table
© 2011 Pearson Education, Inc
2 Test of Independence Contingency Table
Shows number of observations from one sample jointly in sample qualitative variables
House Location House Style Urban Rural Total
Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160
Levels of variable 2
Levels of variable 1
© 2011 Pearson Education, Inc
Finding Expected Cell Counts fora Two-Way Contingency Table
The estimate of the expected number of observations falling into the cell in row i and column j is given by
where Ri = total for row i, Cj = total for column j, and n = sample size.
E
ij=RiC j
n
© 2011 Pearson Education, Inc
General Form of a Contingency Table Analysis: 2 -Test for Independence
H0: The two classifications are independent.
Ha: The two classifications are dependent.
Test statistic: 2 = =
nij −Eij⎡⎣
⎤⎦2
Eij
∑
E
ij=RiC j
nwhere
Rejection region: where has (r – 1)(c – 1) df.
2 > α
2 , α2
© 2011 Pearson Education, Inc
Conditions Required for a Valid 2-Test: Contingency Table
1. A multinomial experiment has been conducted . We may then consider this to be a multinomial experiment with r c possible outcomes.
2. The sample size n is large. This is satisfied if for every cell, the expected count Ei will be equal to 5 or more.
© 2011 Pearson Education, Inc
2 Test of Independence Expected Counts
1. Statistical independence means joint probability equals product of marginal probabilities
2. Compute marginal probabilities and multiply for joint probability
3. Expected count is sample size times joint probability
© 2011 Pearson Education, Inc
112 160
Marginal probability =
Expected Count Example
Location Urban Rural
House Style Obs. Obs. Total
Split–Level 63 49 112
Ranch 15 33 48
Total 78 82 160
© 2011 Pearson Education, Inc
78 160
Marginal probability =
Expected Count Example112 160
Marginal probability =
Location Urban Rural
House Style Obs. Obs. Total
Split–Level 63 49 112
Ranch 15 33 48
Total 78 82 160
© 2011 Pearson Education, Inc
Expected Count Example
78 160
Marginal probability =
112 160
Marginal probability = Joint probability = 112 160
78 160
Location Urban Rural
House Style Obs. Obs. Total
Split–Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected count = 160· 112 160
78 160
= 54.6
© 2011 Pearson Education, Inc
Expected Count Calculation
E
ij =
RiC j
n House Location Urban Rural
House Style Obs. Exp. Obs. Exp. Total
Split-Level 63
112·78 160
54.6 49
112·82 160
57.4 112
Ranch 15
48·78 160
23.4 33
48·82 160
24.6 48
Total 78 78 82 82 160
© 2011 Pearson Education, Inc
As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship?
2 Test of Independence Example
House Location House Style Urban Rural Total
Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160
© 2011 Pearson Education, Inc
2 Test of Independence Solution
• H0:
• Ha:
• α = • df = • Critical Value(s):
No Relationship
Relationship
.05(2 – 1)(2 – 1) = 1
20
Reject H0
3.841
α = .05
© 2011 Pearson Education, Inc
Eij 5 in all cells
2 Test of Independence Solution
House Location Urban Rural
House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160
112·82 160
48·78 160
48·82 160
112·78 160
© 2011 Pearson Education, Inc
[ ] [ ] [ ]
[ ] [ ] [ ]
2
2
all cells
2 2 2
11 11 12 12 22 22
11 12 22
2 2 263 54.6 49 57.4 33 24.6
8.4154.6 57.4 24.6
ij ij
ij
n E
E
n E n E n E
E E E
χ⎡ ⎤−⎣ ⎦=
− − −= + + +
− − −= + + + =
∑
L
L
2 Test of Independence Solution
© 2011 Pearson Education, Inc
2 Test of Independence Solution
• H0:
• Ha:
• α = • df = • Critical Value(s):
Test Statistic:
Decision:
Conclusion:
No Relationship
Relationship
.05(2 – 1)(2 – 1) = 1
20
Reject H0
3.841
α = .05
2 = 8.41
Reject at α = .05
There is evidence of a relationship
© 2011 Pearson Education, Inc
You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship?
2 Test of Independence Thinking Challenge
Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170Total 132 154 286
© 2011 Pearson Education, Inc
2 Test of Independence Solution
• H0:
• Ha:
• α = • df = • Critical Value(s):
No Relationship
Relationship
.05(2 – 1)(2 – 1) = 1
20
Reject H0
3.841
α = .05
© 2011 Pearson Education, Inc
Diet Pepsi No Yes
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
Eij 5 in all cells
170·132 286
170·154 286
116·132 286
154(116) 286
2 Test of Independence Solution*
© 2011 Pearson Education, Inc
[ ] [ ] [ ]
[ ] [ ] [ ]
2
2
all cells
2 2 2
11 11 12 12 22 22
11 12 22
2 2 284 53.5 32 62.5 122 91.5
54.2953.5 62.5 91.5
ij ij
ij
n E
E
n E n E n E
E E E
χ⎡ ⎤−⎣ ⎦=
− − −= + + +
− − −= + + + =
∑
L
L
2 Test of Independence Solution
© 2011 Pearson Education, Inc
2 Test of Independence Solution
• H0:
• Ha:
• α = • df = • Critical Value(s):
Test Statistic:
Decision:
Conclusion:
No Relationship
Relationship
.05(2 – 1)(2 – 1) = 1
20
Reject H0
3.841
α = .05
2 = 54.29
Reject at α = .05
There is evidence of a relationship
© 2011 Pearson Education, Inc
There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?
2 Test of Independence Thinking Challenge 2
Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170Total 132 154 286
© 2011 Pearson Education, Inc
Low Income
You Re-Analyze the Data
High IncomeDiet Pepsi
Diet Coke No Yes Total No 4 30 34 Yes 40 2 42 Total 44 32 76
Diet Pepsi Diet Coke No Yes Total
No 80 2 82 Yes 8 120 128 Total 88 122 210
© 2011 Pearson Education, Inc
True Relationships
Apparent relation
Underlying causal relation
Control or intervening variable (true cause)
Diet Coke
Diet Pepsi
© 2011 Pearson Education, Inc
Moral of the Story
© 1984-1994 T/Maker Co.
Numbers don’t think - People do!
© 2011 Pearson Education, Inc
9.4
A Word of Caution aboutChi-Square Tests
© 2011 Pearson Education, Inc
Caution about the 2 Test
The 2 is one of the most widely applied statistical tools and also one of the most abused statistical tool.
Be certain the experiment satisfies the assumptions.
Be certain the sample is drawn from the correct population.
Avoid using when the expected counts are very small.
© 2011 Pearson Education, Inc
Caution about the 2 Test• If the 2 value does not exceed the established
critical value of 2 , do not accept the hypothesis of independence. You risk a Type II error. Avoid concluding that two classifications are independent, even when 2 is small.
• If a contingency table 2 value does exceed the critical value, we must be careful to avoid inferring that a causal relationship exists between the classifications. The existence of a causal relationship cannot be established by a contingency table analysis.
© 2011 Pearson Education, Inc
Key Ideas
Multinomial Data
Qualitative data that fall into more than two categories (or classes)
© 2011 Pearson Education, Inc
Key Ideas
Properties of a Multinomial Experiment1. n identical trials2. k possible outcomes
3. probabilities of the k outcomes (p1, p2, …, pk) remain the same from trial to trial, wherep1 + p2 + … + pk = 1
4. trials are independent5. variables of interest: cell counts (i.e., number of
observations falling into each outcome category), denoted n1, n2, …, nk
© 2011 Pearson Education, Inc
Key Ideas
One-Way Table
Summary table for a single qualitative variable
Two-Way (Contingency) Table
Summary table for two qualitative variables
© 2011 Pearson Education, Inc
Key Ideas
Chi-Square ( 2) Statistic
used to test category probabilities in one-way and two-way tables
Chi-Square tests for independence
should not be used to infer a causal relationship between 2 Qualitative Variables
© 2011 Pearson Education, Inc
Key Ideas
Conditions Required for Valid 2-Tests
1. multinomial experiment2. sample size n is large (expected cell counts
are all greater than or equal to 5)
Top Related