Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses...

Chi-SquareX2

Review: the “null” hypothesis• Inferential statistics are used to test hypotheses• Whenever we use inferential statistics the “null

hypothesis” applies – Null hypothesis: There is no relationship

between variables. Any apparent effect was produced by chance

– To reject the null, the test statistic (e.g., R2, t, b, X2, etc.) must be so large that the probability the null is true is less than five in one-hundred (< .05)

• How do we know if the null is true?– Compare the test statistic to a table– “Probability” or p means the chance that

the null hypothesis is true– In a study, look for asterisks in the

statistic’s column. If there is no asterisk, the null for that relationship is true.

– Usually one asterisk (*) means the probability the null is true is less than 5 in 100 (p <.05). Two asterisks (**) is better (p <.01, probability the null is true is less than one in 100). Three (***) is great (p <.001, probability less than one in 1,000.)

Null hypothesis is true Reject null hypothesis

• A test statistic, used to test hypotheses• Tests for relationship between two categorical

variables (nominal or ordinal)• Yields a coefficient that can be looked up in a table

– The larger the coefficient, the less the probability that the null hypothesis is correct

• Evaluates difference between Observed and Expected cell frequencies:– “Observed” means the actual data– “Expected” means what we would expect if

there was no relationship between the variables

– If there is no difference between observed and expected frequencies, 2 is zero and the null hypothesis is true

– Greater the difference, the larger the value of 2, thus the smaller the probability that the null hypothesis is true

• We will always place the values of the IV in rows, and of the DV in columns. It can be done the other way, and does not affect computing 2.

Chi-Square (X 2)Hypothesis: Gender Court disposition

Court disposition (observed)

Gender Jail Released Total

Male 84 16 100

Female 30 20 50

Total 114 36 n = 150

Court disposition (expected)


Male 76 24 100

Female 38 12 50

Total 114 36 n = 150

Building the “expected” table

Court disposition


Male 84 16 100

Female 30 20 50

Total 114 36 n = 150

Hypothesis: Gender Court disposition

“Observed” table - the actual data

Court disposition


Male 100

Female 50

Total 114 36 n = 150

Create a new table from scratch

“Expected” table - what you expect if the null hypothesis of no relationship is true

1. Bring over the “marginals” - all the totals

Divide its row totalby the grand total,

then multiply by itscolumn total

2. Fill in eachcell, one at atime

Male/Jail:

Male/Released:

Female/Jail:

Female/Released:

Checking the expected frequencies table by converting it into percentages

In an expected table, as the value of the independent variable changes,the distribution across the dependent variable should remain the same

In this example, as we switch the value of independent variable gender,the distribution across dependent variable court disposition doesn’t change

A properly done expected table will always show no relationship -- it’s the null hypothesis!

Demonstrating the meaning of “expected”

Court disposition (expected freqs.)


Male 76 24 100

Female 38 12 50

Total 114 36 n = 150

Court disposition (expected pcts.)


Male 76% 24% 100%

Female 76% 24% 100%

Comparing the observed and expected tables: the meaning of Chi-Square (X 2)

• The observed table is the data, as we find it• The expected table is purposely built to demonstrate no relationship between variables.

It “is” the null hypothesis.• To determine whether the observed table demonstrates a relationship between

variables, we compare its cell frequencies to those in the “expected” table– The less similar the tables, the more likely that the working hypothesis is true, and the less

likely that the null hypothesis is true • 2 is a ratio that reflects the dissimilarity in cell frequencies. The more dissimilar, the

larger the 2 . O= observed (actual) frequency E= expected frequency (if null hypothesis is true)

(O - E)2

2 = ---------- E

• More formally, 2 is the ratio of systematic variation to chance variation. The larger the ratio, the more likely that we can reject the null hypothesis.

• Chi-square is not always a good measure because its accuracy is closely tied to sample size.

– Over-estimate significance with large samples, under-estimate with small samples– Ideal sample size is around 150, with no cells less than 5

Observed frequencies Court disposition


Male 84 16 100

Female 30 20 50

Total 114 36 n = 150

Expected frequencies

Court disposition


Male 76 24 100

Female 38 12 50

Total 114 36 n = 150

(O - E)2 (84-76)2 (16-24)2 (30-38)2 (20-12)2

2 = --------- = ----------- + ------------ + ------------ + ------------ = 10.5 E 76 24 38 12

Computing X2

Always pair up the corresponding cells and divide by the expected frequency

• To reject the null hypothesis a test statistic, such as 2, must be of sufficient magnitude. The larger the better!

• df = rows minus 1 X columns minus 1 (r-1 X c-1)=(2 – 1) X (2 – 1)=1• In social science research we reject the null hypothesis when there are

fewer than five chances in 1,000 (p=<.05) that it is true. Our chi-square is larger than what we need: there is less than one chance in a thousand (p=<.01) that the null is true.

• Our observed data has proven so different from what would be expected if there was no relationship between variables that we can reject the null hypothesis of no relationship. We thus confirm the working hypothesis that gender affects disposition. There is less than one chance in a thousand that we’re wrong!

Assessing the significance of X2

Null hypothesis is true Reject null hypothesis

2 =10.5

Class exerciseHypothesis: More building alarms Less crime

• Randomly sampled 120 businesses with alarms• 50 had crimes, 70 didn’t

• Randomly sampled 90 businesses without alarms• 50 had crimes, 40 didn’t

• Build the observed and expected tables– Remember, they’re tables, so place the values of the independent variable in rows

• Compute 2 (O - E)2

2 = ---------- E

• Use the table to assess theprobability that the nullhypothesis is correct

df= r-1 X c-1 • Convey your findings using

simple words. What does thedata show about buildingalarms and crime? How certainare you of your conclusions?

Parking lot exercise

1. Graph the distribution of car values for each parking lot

2. Fill in the frequency and percentage tables

Computing expected frequencies

Row marginal Total cases X

3. Use the frequency (not percentage!) table to create a “frequencies expected” table (meaning, expected if the null hypothesis of no relationship is correct)

Computing expected frequencies

10 20

X 6 = 3column marginal

4. Compute X 2: Cell by corresponding cell, subtract EXPECTED from OBSERVED.Square each difference. Divide each result by EXPECTED. Then total them up.

• The greatest risk we can take that the null hypothesis is true is five in one-hundred (.05)• Our Chi-square, 8.66, is greater than 7.815, the required minimum• We can thus reject the NULL hypothesis and accept the WORKING hypothesis that higher

income persons drive more expensive cars, with only five chances in 100 of being wrong.• Larger Chi-squares could have reduced the risk that the null hypothesis is true to two in

one-hundred (.02), one in one-hundred (.01), or even one in one-thousand (.001)

5. Check the table. Begin with the largest probability level that allows you to reject the null hypothesis, .05. Is the Chi-square at least that large? If not, the null hypothesis is true.

Homework

Homework exercise

Hypothesis: Sergeants have more stress than patrol officers

1. Calculate expected cell frequencies (null hypothesis of no relationship is true)

2. Compute Chi-square

3. Use table in Appendix E to determine your chi-square’s probability level

4. Can we reject the null hypothesis?

Homework answer

(30-52)2 (60-38)2 (86-64)2 (24-46)2

2 = --------- + ---------- + --------- + --------- = 40.1 52 38 64 46

Observed

Expected

2 = 40.1df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1To reject at .05 level need 2 = 3.841 or greater

Reject null hypothesis – Less than 1 chance in 1,000 that relationship is due to chance

Practice for the final

• You will test a hypothesis using two categorical variables and determine whether the independent variable has a statistically significant effect.

• You will be asked to state the null hypothesis.

• You will used supplied data to create an Observed frequencies table. You will use it to create an Expected frequencies table. You will be given a formula but should know the procedure.

• You will compute the Chi-Square statistic and degrees of freedom. You will be given formulas but should know the procedures by heart.

• You will use the Chi-Square table to determine whether the results support the working hypothesis.

– Print and bring to class: http://www.sagepub.com/fitzgerald/study/materials/appendices/app_e.pdf

• Sample question: Hypothesis is that alarm systems prevent burglary. Random sample of 120 business with an alarm system and 90 without. Fifty businesses of each kind were burglarized.

– Null hypothesis: No significant difference in crime between businesses with and without alarms

Observed frequencies Expected frequencies

http://www.sagepub.com/fitzgerald/study/materials/appendices/app_e.pdf

http://www.sagepub.com/fitzgerald/study/materials/appendices/app_e.pdf

Observed frequencies Expected frequencies

(50-57)2 (70-63)2 (50-43)2 (40-47)2

--------- + ---------- + ----------- + ----------- = 57 63 43 47 .86 + .78 + 1.14 + 1.04 = 3.82

–Chi-Square = 3.82

–Df = (r-1) X (c-1) = 1

–Check the table. Do the results support the working hypothesis? No - Chi-Square must be at least 3.84 to reject the null hypothesis of no relationship between alarm systems and crime, with only five chances in 100 that it is true

Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses...

Documents

Transcript of Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses...