1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

57
1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University

Transcript of 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

Page 1: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

1

Inference for Categorical Data

William P. Wattles, Ph. D.

Francis Marion University

Page 2: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

2

Continuous vs. Categorical

• Continuous (measurement) variables have many values

• Categorical variables have only certain values representing different categories

• Ordinal-a type of categorical with a natural order (e.g., year of college)

• Nominal-a type of categorical with no order (e.g., brand of cola)

Page 3: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

3

Categorical Data

• Tells which category an individual is in rather than telling how much.

• Sex, race, occupation naturally categorical• A quantitative variable can be grouped to

form a categorical variable. • Analyze with counts or percents.

Page 4: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

4

Describing relationships in categorical data

• No single graph portrays the relationship

• Also no similar number summarizes the relationship

• Convert counts to proportions or percents

Page 5: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

55

Prediction

Page 6: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

66

Prediction

Page 7: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

7

Moving from descriptive to Inferential

• Chi Square Inference involves a test of independence.

• If variable are independent, knowledge of one variable tells you nothing about the other.

Page 8: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

8

Moving from descriptive to Inferential

• Inference involves expected counts. – Expected count=The count that would occur if

the variables are independent

Page 9: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

9

Inference for two-way tables

• Chi Square test of independence.• For more than two groups• Cannot compare multiple groups one at a

time.

Page 10: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

10

To Analyze Categorical Data

• First obtain counts• In Excel can do this with a pivot table• Put data in a Matrix or two-way table

Page 11: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

11

Matrix or two-way table

Republican Democrat Independent

Male 18 43 14

Female 39 23 18

Page 12: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

12

Inference for two-way tables

• Expected count• The count that would occur if the variables

are independent

Page 13: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

13

Matrix or two-way table

• Rows• Columns• Distribution: how often each outcome

occurred• Marginal distribution: Count for all entries

in a row or column

Page 14: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

14

Row and column totals

RepublicanDemocrat IndependentMale 18 43 14 75Female 39 23 18 80

57 66 32 155

Page 15: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

15

RepublicanDemocrat IndependentMale 75 48%Female 80 52%

57 66 32 15537% 43% 21%

Page 16: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

16

Expected counts

• 37% of all subjects are Republicans• If independent 37% of females should be

Republican (expected value)• 37% of 80= 29• 37% of 75 = 28

Page 17: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

17

Expected counts rounded

Republican Democrat Independent totalMale 28 32 15 75Female 29 34 17 80total 57 66 32 155

Page 18: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

18

Observed vs. Expected

RepublicanDemocrat IndependentMale 18 43 14 75Female 39 23 18 80

57 66 32 155

Republican Democrat Independent totalMale 28 32 15 75Female 29 34 17 80total 57 66 32 155

Page 19: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

19

Chi-Square

• Chi-square A measure of how far the observed counts are from the expected counts

Page 20: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

20

Chi-square test of independence

e

eo

f

ffX

22 )(

Page 21: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

21

Chi Square test of independence with SPSS

Page 22: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

22

Chi Square test of independence with SPSS

Page 23: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

23

Chi Square

Page 24: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

24

Chi-square test of independence

• Degrees of Freedom• df=number of rows-1 times number of

columns -1• compare the observed and expected counts.• P-value comes from comparing the Chi-

square statistic with critical values for a chi-square distribution

Page 25: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

25

Example

• Have the percent of majors changed by school?

Page 26: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

26

Data collection

http://www.fmarion.edu/about/FactBook

2004/2005 Fall 2004 Graduates by Major

Page 27: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

27

Page 28: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

28

Page 29: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

29

Chi Square

Page 30: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

30

Marital Status, page 543

job grade single married divorced widowed1 58 874 15 82 222 3927 70 203 50 2396 34 104 7 533 7 4

Page 31: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

31

Marital Status, page 543

Test Statistics Value df p-valuePearson Chi-Square 67.491 9 0.0000

Page 32: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

32

Olive Oil, page 578

 

low medium highColon cancer 398 397 430rectal 250 241 217controls 1368 1377 1409

Olive Oil

Page 33: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

33

Olive Oil, page 578

Test Statistics Value df p-valuePearson Chi-Square 1.552 4 0.817Continuity Adjusted Chi-Square1.396 4 0.845Likelihood Ratio Chi-Square1.549 4 0.818

Page 34: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

34

Business Majors, page 563

Female MaleAccounting 68 56Administration 91 40Economics 5 6Finance 61 59

Page 35: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

35

Business Majors, page 563

Test Statistics Value df p-valuePearson Chi-Square 10.827 3 0.013

Page 36: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

36

Exam Three

• 37 multiple choice questions, 4 short answer

• T-tests and chi square on Excel

• General questions about analyzing categorical data and t-tests

• Review from earlier this term

Page 37: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

37

Inference as a decision

• We must decide if the null hypothesis is true.

• We cannot know for sure.• We choose an arbitrary standard that is

conservative and set alpha at .05• Our decision will be either correct or

incorrect.

Page 38: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

38

Type I and Type II errors

Ho is really True

Ho is really False

We reject Ho

Type I Error (false alarm)

Correct Decision

We accept Ho

Correct decision Type II Error (miss)

Page 39: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

39

Type I error

• If we reject Ho when in fact Ho is true, this is a Type I error

• Statistical procedures are designed to minimize the probability of a Type I error, because they are more serious for science.

• With a Type I error we erroneously conclude that an independent variable works.

Page 40: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

40

Type II error

• If we accept Ho when in fact Ho is false this is a Type II error.

• A type two error is serious to the researcher.• The Power of a test is the probability that

Ho will be rejected when it is, in fact, false.

Page 41: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

41

Probability

Ho is really True

Ho is really False

We reject Ho

p= p=1-

We accept Ho

p=1- p=

Page 42: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

42

Power

• The goal of any scientific research is to reject Ho when Ho is false.

• To increase power:– a. increase sample size– b. increase alpha– c. decrease sample variability– d. increase the difference between the means

Page 43: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

43

Categorical data example

• African-American students more likely to register via the web.

Page 44: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

44

Table

Variable White African-AmericanStudents University-Wide n Percent n PercentRegister on the Web 447 34% 284 44%Register with other method 876 66% 356 56%Total 1323 640

Page 45: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

45

Web Registration by Race

34%

25%

44%

29%

0%

10%

20%

30%

40%

50%

60%

2000 2001Year

White

African-American

Page 46: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

46

Categorical Data Example

• African-American students university-wide (44%) were more likely that white students (34%) to use web registration, X2(1, N = 1963) = 20.7 , p < .001.

Page 47: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

47

Page 48: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

48

Smoking among French Men

• Do these data show a relationship between education and smoking in French men?

Page 49: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

49

Page 50: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

50

Page 51: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

51

The EndThe End

Page 52: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

52

Benford’s Law page 550

• Faking data?

Page 53: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

53

Problem 20.14

Digit ratio Observed1 0.301 62 0.176 43 0.125 64 0.097 75 0.079 36 0.067 57 0.058 68 0.051 49 0.046 4

Page 54: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

54

Digit ratio Expected Observed1 0.301 13.545 62 0.176 7.92 43 0.125 5.625 64 0.097 4.365 75 0.079 3.555 36 0.067 3.015 57 0.058 2.61 68 0.051 2.295 49 0.046 2.07 4

Page 55: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

55

Expected Observed13.545 6 4.20280731

7.92 4 1.940202025.625 6 0.0254.365 7 1.590658653.555 3 0.086645573.015 5 1.306873962.61 6 4.40310345

2.295 4 1.266677562.07 4 1.7994686

16.6214371

Page 56: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

56

Significance test

chitest p = 0.03430

Page 57: 1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.

57

Example

• Survey2 Berk & Carey page 261