Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

50
Copyright (c) Bani K. Mal lick 1 STAT 651 Lecture #17
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Page 1: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture #17

Page 2: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 2

Topics in Lecture #17 Chi-squared tests for independence

Page 3: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 3

Book Sections Covered in Lecture #17

Chapter 10.6

Page 4: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 4

Lecture 17 Review: Comparison of Two Population Proportions

In some cases, we may want to compare two populations 1 and 2

The null hypothesis is H0: 1 = 2

This is the same as H0: 1 - 2 = 0

Page 5: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 5

Lecture 17 Review: Comparison of Two Population Proportions

The null hypothesis is H0: 1 - 2 = 0

Form a CI for the difference in population proportions 1 - 2

The estimate of this difference is simply the difference in the sample fractions:

1 2ˆ ˆ

Page 6: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 6

Lecture 17 Review: Comparison of Two Population Proportions

The estimated standard error of the difference in the sample fractions:

The (1100% CI then is

2

1 1 2 2

1 2

1 1

1ˆ ˆ

( ) ( )ˆ ˆ ˆ ˆˆ

n n

21 2 2 1/ ˆ ˆzˆ ˆ ˆ

Page 7: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 7

Lecture 17 Review: Comparison of Two Population Proportions:

Remarkably, but perhaps not surprisingly, you do not have to compute these confidence intervals by hand!

The idea: simply pretend, and I do mean pretend, that the binary outcomes are real numbers and run your ordinary t-test CI, unequal variance line

Page 8: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 8

Chisquared Tests for Independence and Homogeneity

In the previous lecture, we asked whether two populations has the same fraction (proportion)

Thus, the populations were (1) very good beers (2) good or fair beers

We looked at the proportion of beer that were widely available in the U.S.

Page 9: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 9

Chisquared Tests for Independence and Homogeneity

Thus, the populations were (1) very good beers (2) good or fair beers

We looked at the proportion of beers that were widely available in the U.S.

If the proportions of the population that are widely available are the same, then the proportion of beer that is widely available is independent of whether the beer of very good or just good/fair.

Page 10: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 10

Chisquared Tests for Independence and Homogeneity

If the proportions of the population that are widely available are the same, then the proportion of beer that is widely available is independent of whether the beer of very good or just good/fair.

Thus, we can think about testing whether the outcomes (widely available or not) are independent of the populations (very good or fair/good)

Page 11: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 11

Chisquared Tests for Independence and Homogeneity

We can test whether two categorical factors (availability and beer rating) are independent or not

The null hypothesis is that they are independent

The alternative hypothesis is that they are not independent

This can be tested using a chisquared test

0H

aH

Page 12: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 12

Chisquared Tests for Independence and Homogeneity

The chisquared test for independence of two categorical factors:

The factors can have more than 2 levels: this is the main advantage of the chisquared test

Thus, for example, you could define three populations of beers (fair, good, very good), three levels of availability (special, regional, national) and ask whether quality is independent of availability.

Page 13: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 13

Chisquared Tests for Independence and Homogeneity

In SPSS, you get a table with counts in each “cell” and along the rows and columns

# of observations in row i, column j =

# of observations in row i =

# of observations in column j =

# of observations =

ijn

in

jn

n

Page 14: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 14

Factor A

Level 1 Level 2

Level 1Factor B

Level 2

11n

This describes the populationtable in its most general formin terms of counts

12n

21n 22n

11 12 1n n n

21 22 2n n n

11 21 1n n n 12 22 2n n nn

Page 15: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 15

US Availability and Rating: Are Better Beers More Widely

Available?Availability in the U.S. * Very Good versus Other Crosstabulation

Count

6 6 12

5 18 23

11 24 35

National

Regional

Availabilityin the U.S.

Total

Very Good Fair or Good

Very Good versus Other

Total

The are 6 Very Good, National beers in this sample.

There are 11 Very Good beers in the sample

The total sample is of size 35

Page 16: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 16

Chisquared Tests for Independence and Homogeneity

For purposes of explanation, let’s make up a fake example.

Consider two categorical factors for males: height (short, tall) and favorite sport (golf, baseball)

We probably would not expect these to be related.

Here is the data table (next slide!)

Page 17: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 17

Sport Preference

Golf Baseball

ShortHeight

Tall

This describes the population table in its most general form in terms of row and column counts only.

There are 400 short men, 200 golfers, etc.

1000

400

600

200 800

Page 18: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 18

Sport Preference

Golf Baseball

ShortHeight

Tall

Under the null hypothesis that height and sport preference areindependent, how many short men who play golf would youExpect? Think hard about this!

1000

400

600

200 800

Page 19: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 19

Sport Preference

Golf Baseball

ShortHeight

Tall

80

Note that 40% of men are short. Under the null hypothesis that height and sport preference are independent, of the 200 men who prefer golf, you would expect 40% = 80 to be short.

1000

400

600

200 800

Expected Cell Counts under the null hypothesis

Page 20: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 20

Sport Preference

Golf Baseball

ShortHeight

Tall

80

Note that 40% of men are short. Under the null hypothesis that height and sport preference are independent, of the 800 men who prefer baseball, you would also expect 40% = 320 to be short.

1000

400

600

200 800

320

Expected Cell Counts under the null hypothesis

Page 21: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 21

Sport Preference

Golf Baseball

ShortHeight

Tall

80

Under the null hypothesis that height and sport preference are independent, you can fill out the rest of the table ofexpected counts

1000

400

600

200 800

320

Expected Cell Counts under the null hypothesis

120 480

Page 22: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 22

Sport Preference

Golf Baseball

ShortHeight

Tall

80 & 100

Now you have to ask yourself, are the observed counts and the expected counts (under independenceunder independence) sufficiently different as to make the null hypothesis very unlikely?

1000

400

600

200 800

320 & 300

Expected Cell Counts under the null hypothesis

and the Observed Counts

120 & 100

480 & 500

Page 23: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 23

Chisquared Tests for Independence and Homogeneity

The expected number of observations in any cell of the table is

You simply multiply the row and column totals and divide by the total sample size.

The chisquared test for independence and homogeneity simply compares the actual table fractions/numbers (observed) to the table fractions/numbers you would expect (expected) under the null hypothesis of independence

i jn n / n

Page 24: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 24

Sport Preference

Golf Baseball

ShortHeight

Tall

80 & 100

Note how the expected count for the number of men who are tall and prefer baseball is 600 x 800 / 1000 = 480

1000

400

600

200 800

320 & 300

Expected Cell Counts under the null hypothesis

and the Observed Counts

120 & 100

480 & 500

Page 25: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 25

Chisquared Tests for Independence and Homogeneity

The test statistic is computed as follows

First get the expected counts under independence

Compute 2

2

sum over all cells in the table

(Observed - Expected)Expected

Page 26: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 26

Chisquared Tests for Independence and Homogeneity

The test statistic is 2

2

i j

ij

i jrows i,columns j

n nn

nn n

n

This has the counts in the cells and their expected values under the null hypothesis

Page 27: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 27

Chisquared Tests for Independence and Homogeneity

If the Table has r-rows and c-columns:

The test statistic is

You reject the null hypothesis at Type I error (level) if >

Here “cuts off” area in the chisquared distribution with (r-1)x(c-1) degrees of freedom (Table 7)

2

2 21 1 ,(r ) (c )

21 1 ,(r ) (c )

Page 28: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 28

Chisquared Tests for Independence and Homogeneity

The chisquared statistic can be computed in SPSS, by going to “analyze”, “descriptives” “crosstabs”

Then click on “statistics” and ask for “chisquared”

The p-value is slightly different from the p-value using the t-test method, but is generally pretty close. If in dispute, use the Pearson chisquared reading, with 1 exception

Page 29: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 29

Chisquared Tests for Independence and Homogeneity

SPSS will print out a message if the expected count is < 5 in any cell, i.e.,

In this case, use the Fisher exact value

If Fisher’s exact value does not exist in a package, use the likelihood ratio test p-value

5 i jn n / n

Page 30: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 30

US Availability and Rating: Are Better Beers More Widely

Available?Availability in the U.S. * Very Good versus Other Crosstabulation

Count

6 6 12

5 18 23

11 24 35

National

Regional

Availabilityin the U.S.

Total

Very Good Fair or Good

Very Good versus Other

Total

This is the table of observed counts. Under the null hypothesis that availability and beer quality are independent, how many very good, national beers would you expect?

Page 31: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 31

US Availability and Rating: Are Better Beers More Widely

Available?Availability in the U.S. * Very Good versus Other Crosstabulation

Count

6 6 12

5 18 23

11 24 35

National

Regional

Availabilityin the U.S.

Total

Very Good Fair or Good

Very Good versus Other

Total

This is the table of observed counts. Under the null hypothesis that availability and beer quality are independent, how many very good, national beers would you expect?

12 x 11 / 35 = 3.77

Page 32: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 32

US Availability and Rating: Are Better Beers More Widely

Available?Availability in the U.S. * Very Good versus Other Crosstabulation

Count

6 & 3.77 6 & 8.23 125 & 7.23 18 & 15.77 23

11 24 35

National

Regional

Availabilityin the U.S.

Total

Very Good Fair or Good

Very Good versus Other

Total

This is the table of observed counts, with the expected counts under the null hypothesis that availability and beer quality are independent

Page 33: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 33

US Availability and Rating: Are Better Beers More Widely

Available?Availability in the U.S. * Very Good versus Other Crosstabulation

Count

6 & 3.77 6 & 8.23 125 & 7.23 18 & 15.77 23

11 24 35

National

Regional

Availabilityin the U.S.

Total

Very Good Fair or Good

Very Good versus Other

Total

The chisquared statistic is

( (6-3.77) x (6-3.77)/3.77)

+ ( (6 – 8.23) x (6-8.23) / 8.23) + ((5-7.23) x (5-7.23)/7.23)

+ ((18-15.77) x (18-15.77)/15.77) = 2.9

Page 34: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 34

US Availability and Rating: Are Better Beers More Widely

Available?Availability in the U.S. * Very Good versus Other Crosstabulation

Count

6 & 3.77 6 & 8.23 125 & 7.23 18 & 15.77 23

11 24 35

National

Regional

Availabilityin the U.S.

Total

Very Good Fair or Good

Very Good versus Other

Total

The chisquared statistic is = 2.9.

Here r = 2, c = 2, (r-1) x (c-1) = 1, and the critical value from Table 7 is 3.8416.

Note though that an expected cell count is < 5, so you have to use

Fisher’s exact value

Page 35: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 35

US Availability and Rating: Are Better Beers More Widely

Available?: p = 0.130 Note the warning message

Chi-Square Tests

2.922 b 1 .087

1.758 1 .185

2.854 1 .091

.130 .094

2.839 1 .092

35

Pearson Chi-Square

Continuity Correction a

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.(2-sided)

Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

1 cells (25.0%) have expected count less than 5. The minimum expected count is3.77.

b.

Note the warning message in red, indicating the need to use Fisher’s exact test

Page 36: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 36

Chisquared Tests for Independence and Homogeneity

SPSS also gives you expected counts and percentages in the table.

You ask for “Cells” and then click on what you want

SPSS demo

Page 37: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 37

Chisquared Tests for Independence

Availability in the U.S. * Very Good versus Other Crosstabulation

6 6 12

3.8 8.2 12.0

50.0% 50.0% 100.0%

54.5% 25.0% 34.3%

5 18 23

7.2 15.8 23.0

21.7% 78.3% 100.0%

45.5% 75.0% 65.7%

11 24 35

11.0 24.0 35.0

31.4% 68.6% 100.0%

100.0% 100.0% 100.0%

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

National

Regional

Availabilityin the U.S.

Total

.00 Very Good

Very Good versus Other

Total

Page 38: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 38

Raw Counts

Availability in the U.S. * Very Good versus Other Crosstabulation

6 6 12

3.8 8.2 12.0

50.0% 50.0% 100.0%

54.5% 25.0% 34.3%

5 18 23

7.2 15.8 23.0

21.7% 78.3% 100.0%

45.5% 75.0% 65.7%

11 24 35

11.0 24.0 35.0

31.4% 68.6% 100.0%

100.0% 100.0% 100.0%

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

National

Regional

Availabilityin the U.S.

Total

.00 Very Good

Very Good versus Other

Total

Page 39: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 39

Expected Under Independence

Availability in the U.S. * Very Good versus Other Crosstabulation

6 6 12

3.8 8.2 12.0

50.0% 50.0% 100.0%

54.5% 25.0% 34.3%

5 18 23

7.2 15.8 23.0

21.7% 78.3% 100.0%

45.5% 75.0% 65.7%

11 24 35

11.0 24.0 35.0

31.4% 68.6% 100.0%

100.0% 100.0% 100.0%

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

National

Regional

Availabilityin the U.S.

Total

.00 Very Good

Very Good versus Other

Total

Page 40: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 40

% Within Rows

Availability in the U.S. * Very Good versus Other Crosstabulation

6 6 12

3.8 8.2 12.0

50.0% 50.0% 100.0%

54.5% 25.0% 34.3%

5 18 23

7.2 15.8 23.0

21.7% 78.3% 100.0%

45.5% 75.0% 65.7%

11 24 35

11.0 24.0 35.0

31.4% 68.6% 100.0%

100.0% 100.0% 100.0%

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

National

Regional

Availability

in the U.S.

Total

.00 Very Good

Very Good versus Other

Total

Page 41: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 41

% Within Columns

Availability in the U.S. * Very Good versus Other Crosstabulation

6 6 12

3.8 8.2 12.0

50.0% 50.0% 100.0%

54.5% 25.0% 34.3%

5 18 23

7.2 15.8 23.0

21.7% 78.3% 100.0%

45.5% 75.0% 65.7%

11 24 35

11.0 24.0 35.0

31.4% 68.6% 100.0%

100.0% 100.0% 100.0%

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

Count

Expected Count

% within Availabilityin the U.S.

% within Very Goodversus Other

National

Regional

Availabilityin the U.S.

Total

.00 Very GoodVery Good versus Other

Total

Page 42: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 42

Chisquared Tests for Independence and Homogeneity

SPSS also allows you to have categorical factors with more than two levels

Rated Quality of Beer * Availability in the U.S. Crosstabulation

Count

6 5 11

4 10 14

2 8 10

12 23 35

VeryGood

Good

Fair

Rated Qualityof Beer

Total

National Regional

Availability in the U.S.

Total

Page 43: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 43

Chisquared Tests for Independence and Homogeneity

Note warning message

Here you use Likelihood Ratio since there is no Fisher

Chi-Square Tests

3.113a 2 .211

3.086 2 .214

2.750 1 .097

35

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

3 cells (50.0%) have expected count less than 5. Theminimum expected count is 3.43.

a.

Page 44: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 44

Education and Raises in Construction: do you see any

structure?Number of Promotions: 0, 1, 2, 3+ * Education level Crosstabulation

71 153 224

77.8 146.2 224.0

39 71 110

38.2 71.8 110.0

26 29 55

19.1 35.9 55.0

11 17 28

9.7 18.3 28.0

8 21 29

10.1 18.9 29.0

155 291 446

155.0 291.0 446.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

0 Promotions

1 Promotion

2 Promotions

3 Promotions

4+ Promotions

Number ofPromotions: 0,1, 2, 3+

Total

Less thanBachelor's

Bachelor'sor higher

Education level

Total

Page 45: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 45

Education and Raises in Construction

Chi-Square Tests

5.659a 4 .226

5.533 4 .237

.681 1 .409

446

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 9.73.

a.

Page 46: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 46

# Companies & Raises in Construction

Number of Promotions: 0, 1, 2, 3+ * Number of Companies Worked For (Categorical) Crosstabulation

170 40 14 224

179.8 34.7 9.5 224.0

93 16 1 110

88.3 17.0 4.7 110.0

44 8 3 55

44.1 8.5 2.3 55.0

25 2 1 28

22.5 4.3 1.2 28.0

26 3 0 29

23.3 4.5 1.2 29.0

358 69 19 446

358.0 69.0 19.0 446.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

0 Promotions

1 Promotion

2 Promotions

3 Promotions

4+ Promotions

Number ofPromotions: 0,1, 2, 3+

Total

<= 5companies

6-10companies

11+companies

Number of Companies Worked For(Categorical)

Total

Notice how those who have worked for lot of companies have small number of promotions.

Page 47: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 47

Number of Promotions: 0, 1, 2, 3+ * Number of Companies Worked For (Categorical) Crosstabulation

170 40 14 224

179.8 34.7 9.5 224.0

47.5% 58.0% 73.7% 50.2%

93 16 1 110

88.3 17.0 4.7 110.0

26.0% 23.2% 5.3% 24.7%

44 8 3 55

44.1 8.5 2.3 55.0

12.3% 11.6% 15.8% 12.3%

25 2 1 28

22.5 4.3 1.2 28.0

7.0% 2.9% 5.3% 6.3%

26 3 0 29

23.3 4.5 1.2 29.0

7.3% 4.3% .0% 6.5%

358 69 19 446

358.0 69.0 19.0 446.0

100.0% 100.0% 100.0% 100.0%

Count

Expected Count

% within Number ofCompanies WorkedFor (Categorical)

Count

Expected Count

% within Number ofCompanies WorkedFor (Categorical)

Count

Expected Count

% within Number ofCompanies WorkedFor (Categorical)

Count

Expected Count

% within Number ofCompanies WorkedFor (Categorical)

Count

Expected Count

% within Number ofCompanies WorkedFor (Categorical)

Count

Expected Count

% within Number ofCompanies WorkedFor (Categorical)

0 Promotions

1 Promotion

2 Promotions

3 Promotions

4+ Promotions

Number ofPromotions: 0,1, 2, 3+

Total

<= 5companies

6-10companies

11+companies

Number of Companies Worked For(Categorical)

Total47.5% of those with <= 5 companies have zero promotions,14.3% have 3 or more

73.7% of those with 11+ companies have zero promotions, 5.3% have 3 or more

May suggest a trend

Page 48: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 48

# Companies & Raises in Construction

Chi-Square Tests

11.515a 16 .777

14.989 16 .525

5.460 1 .019

446

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

16 cells (64.0%) have expected count less than 5. Theminimum expected count is .06.

a.

General chisquared test is not significant

Note the significant “Linear-by-Linear Association”.What is this?

Page 49: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 49

# Companies & Raises in Construction

Chi-Square Tests

11.515a 16 .777

14.989 16 .525

5.460 1 .019

446

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

16 cells (64.0%) have expected count less than 5. Theminimum expected count is .06.

a.

Linear-by-Linear Association (Crosstabs)A measure of linear association between the row and column variables I This statistic should not be used for nominal (unordered) data.

Also known as the Mantel-Haenszel chi-square test. This makes sense: there appears to be some ordered inverse relationship between # of promotions and # of companies

Page 50: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.

Copyright (c) Bani K. Mallick 50

# Companies & Raises in Construction

0 5 10 15 20

Number of companies worked for

0

10

20

30

40

50

Construction DataNote the negative trend