What is chi-square CHIDIST Non-parameteric statistics 2.

Post on 18-Dec-2015

221 views 1 download

Transcript of What is chi-square CHIDIST Non-parameteric statistics 2.

Social Statistics: Chi-square

What is chi-square CHIDIST Non-parameteric statistics

This week

2

A main branch of statistics Assuming data with a type of probability

distribution (e.g. normal distribution) Making inferences about the parameters of the

distribution (e.g. sample size, factors in the test)

Assumption: the sample is large enough to represent the population (e.g. sample size around 30).

They are not distribution-free (they require a probability distribution)

Parametric statistics

3

Nonparametric statistics (distribution-free statistics)

Do not rely on assumptions that the data are drawn from a given probability distribution (data model is not specified).

It was widely used for studying populations that take on a ranked order (e.g. movie reviews from one to four stars, opinions about hotel ranking). Fits for ordinal data.

It makes less assumption. Therefore it can be applied in situations where less is known about the application.

It might require to draw conclusion on a larger sample size with the same degree of confidence comparing with parametric statistics.

Nonparametric statistics

4

Nonparametric statistics (distribution-free statistics) Data with frequencies or percentage

Number of kids in difference grades The percentage of people receiving social

security

Nonparametric statistics

5

One-sample chi-square includes only one dimension Whether the number of respondents is equally

distributed across all levels of education. Whether the voting for the school voucher has

a pattern of preference.

Two-sample chi-square includes two dimensions Whether preference for the school voucher is

independent of political party affiliation and gender

One-sample/Two-sample chi-square

6

E

EO 22 )(

Compute chi-square

O: the observed frequencyE: the expected frequency

One-sample chi-square test

7

Preference for School Voucher for maybe against total23 17 50 90

Example

Question: Whether the number of respondents is equally distributed across all opinions?

One-sample chi-square

8

Step1: a statement of null and research hypothesis

Chi-square steps

3210 : PPPH

3211 : PPPH

There is no difference in the frequency or proportion in each category

There is difference in the frequency or proportion in each category

9

Step2: setting the level of risk (or the level of significance or Type I error) associated with the null hypothesis 0.05

Chi-square steps

10

Step3: selection of proper test statistic Frequencynonparametric

procedureschi-square

Chi-square steps

11

Step4. Computation of the test statistic value (called the obtained value)

Chi-square steps

category

observed frequency

(O)expected

frequency (E) D(difference) (O-E)2 (O-E)2/Efor 23 30 7 49 1.63maybe 17 30 13 169 5.63against 50 30 20 400 13.33Total 90 90    20.60

12

Step5: determination of the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic Distribution of Chi-Square df = r-1 (r= number of categories) If the obtained value > the critical value

reject the null hypothesis If the obtained value < the critical value

accept the null hypothesis

Chi-square steps

13

Chi-square steps

14

Step6: a comparison of the obtained value and the critical value is made 20.6 and 5.991

Chi-square steps

15

Step 7 and 8: decision time What is your conclusion, why and how to

interpret?

Chi-square steps

16

We’ll settle the age-old debate of whether people can actually detect their favorite cola based solely on taste. For 30 coke-lovers, I blindfold them, and have them sample 3 colas…is there a true difference, or are these preference differences explainable by chance?

Another example

17

Null: There are no preferences: The population is divided evenly among the brands

Alternate: There are preferences: The population is not divided evenly among the brands

Hypothesis

18

df = C -1 = 3 -1 = 2, set α = .05 For df = 2, X2-crit = 5.99

Chance Model

19

category

observed frequency

(O)expected

frequency (E) D(difference) (O-E)2 (O-E)2/ECoke 13 10 3 9 0.9Pepsi 9 10 1 1 0.1RC Cola 8 10 2 4 0.4Total 30 30 1.4

Calculate Chi-Square

20

Conclude that the preferences are evenly divided among the colas when the logos are removed.

Decision and Conclusion

22

2

2

40.1

99.5

critobt

obt

crit

21

CHIDIST (x,degrees_freedom) CHIDIST(20.6,2)

0.000036<0.05 CHIDIST(1.40,2)

0.496585304>0.05

Excel functions

22