1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.
-
Upload
charity-simmons -
Category
Documents
-
view
215 -
download
1
Transcript of 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.
1
Chapter 14
Preprocessing the Data,
And Cross-Tabs
© 2005 Thomson/South-Western
2
Figure 1: Histogram and Frequency Polygon of Incomes of Families in Car Ownership Study
0
5
10
15
20
250
k
15
k
25
k
35
k
45
k
55
k
65
k
75
k
85
k
95
k
10
5k
3
Figure 2: Cumulative Distribution of Incomesof Families in Car Ownership Study
0
20
40
60
80
100
120
0k
15
k
25
k
35
k
45
k
55
k
65
k
75
k
85
k
95
k
10
5k
4
Family Income and Number of Cars Family Owns
Number of Cars
Income
Less than $37,500
More than $37,500
TOTAL
1 or None 2 or More Total
48
27
75
6
19
25
54
46
100
5
Number of Cars by Family Income
Number of Cars
Income
Less than $37,500More than $37,500
1 or None 2 or More Total
89%
59%
11%
41%
100%
100%
# of Cases
54
46
6
Family Income by Number of Cars
Number of Cars
Income
Less than $37,500
More than $37,500
Total
1 or None 2 or More
64%
36%
100%
(75)
24%
76%
100%
(25)(Number of Cases)
7
Number of Cars and Size of Family
Number of Cars
Size of Family
4 or Less
5 or More
Total
1 or None 2 or More Total
70
5
75
8
17
25
78
22
100
8
Number of Cars by Size of Family
Number of Cars
Size of Family
4 or Less
5 or More
1 or None 2 or More Total
90%
23%
10%
77%
100%
100%
# of Cases
(78)
(22)
9
Number of Cars by Income and Size of Family
Income
Less than $37,500
More than $37,500
TOTAL
44
26
70
2
6
8
46
32
78
1 orNone
2 orMore
Total
4
1
5
4
13
17
8
14
22
1 orNone
2 orMore
Total
48
27
75
6
19
25
54
46
100
1 orNone
2 orMore
Total
Four Members or Less:
Total Number of Cars
Number of Cars Number of Cars
Five Members or More:
10
Number of Cars by Income and Size of Family
Income
Less than $37,500
More than $37,500
96%
81%
4%
19%
100% (46)
100% (32)
1 orNone
2 orMore
Total
50%
7%
50%
93%
100% (8)
100% (14)
1 orNone
2 orMore
Total
89%
59%
11%
41%
100% (54)
100% (46)
1 orNone
2 orMore
Total
Four Members or Less:
Total Number of Cars
Number of Cars Number of Cars
Five Members or More:
11
Car Ownership for Small, Below Average Income Families
Number of Cars
Income
Less than $37,500
1 or None 2 or More Total
96%
4% 100% (46)
12
Percentage of Families Owning Two or More Cars by Income
Number of Cars
Income
Less than $37,500
4 or Less 5 or More Total
4% 50%
11% (6)
More than $37,500
19%
93%
41% (19)
13
Conditions That Can Arise with the Introduction of an Additional Variable into a Cross Tabulation
With the Additional VariableInitialSituation Change
ConclusionRetainConclusion
SomeRelationship
Refine Explanation
Reveal SpuriousExplanationProvide LimitingConditions
A.
B.
C.II
IVNoRelationship
I
III
14
The Researcher’s Dilemma
True Situation
Researcher’sConclusion
NoRelationship
SomeRelationship
NoRelationship
SomeRelationship
CorrectDecision
SpuriousCorrelation
Correct Decisionif ConcludedRelationship isof Proper Form
SpuriousNoncorrelation
15
Source:
Appendix 14A
Chi-Square Tests
16
Measures of Association for Nominal Data
Measures Appropriate for Nominal Data
* Contingency Table (Chi-Square)
* Contingency Coefficient
* Index of Predictive Association
17
Family Size: 4 or less
5 or more
#Cars: 0 or 1 2+
70 8
5 17
75 25
78
22
100
Frequencies of Combinations of Row (i) and Column (j)
Cross Tabulations
18
H0: Row variable independent of column variable; No association between family size & #cars
analogous to: “no correlation”
Cross-Tabs & Chi-Squares
Family Size: 4 or less
5 or more
#Cars: 0 or 1 2+
75 2575% 25%
78 78%
22 22%
100
19
We’d EXPECT frequencies to be distributed “randomly”; i.e., in proportion to the margins
If Family Size & #Cars are Independent:
Family Size: 4 or less
5 or more
#Cars: 0 or 1 2+
75 2575% 25%
78 78%
22 22%
100
58.5 19.5
16.5 5.5
20
•If A & B are independent: P(A1B1) = P(A1)P(B1)
Using the Statistical Definition of “Independence” to Calculate the Expected Frequencies
•e11 = nP(A1B1) = 100 (78/100) (75/100) = (78 x 75) 100
21
Chi-square measures how much our data differ from what we’d expect (given the hypothesis of independence)
Are the row and column variables associated ?
c
j ij
ijijr
i e
eoX
1
2
1
2 )(
Chi-Square Formula
22
X 2 = ( 70-58.5 ) 2 + ( 8-19.5 ) 2 + (5-16.5 ) 2 + (17-5.5 ) 2
58.5 19.5 16.5 5.5
= 2.261 + 6.782 + 8.015 + 24.046 = 41.104
Is this large?
Chi-Square for Our Data
df= degrees of freedom = ( r-1) ( c-1)
For our 2x2 table, df=1
critical value for X 2 with 1 df = 3.84 (.05)
X 2 = 41.104 exceeds 3.84.
23
Three-way table:
Example: Family size x #Cars x household income
Log Linear Models
Extension Beyond 2-Way Tables
24
Equation:
Degrees of Freedom: ( r-1 )
When would you use this statistic? e.g., compare sample to population characteristics, or to previous
study’s benchmark to investigate the great M&M caper:
One-Way Chi-Square
r
i i
ii
e
eoX
1
22 )(
25
PLAIN PEANUTei’s oi’s ei’s oi’s
blue brown green orange red yellow
critical chi-square on 5 df = 11.07
The Case of the Blue M&M’s: