Chi Square Analyses: Comparing Frequency Distributions.

24
Chi Square Analyses: Comparing Frequency Distributions

Transcript of Chi Square Analyses: Comparing Frequency Distributions.

Page 1: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Analyses:Comparing Frequency Distributions

Page 2: Chi Square Analyses: Comparing Frequency Distributions.
Page 3: Chi Square Analyses: Comparing Frequency Distributions.

Chi-Square Tests

• test probability distributions from nominal, ordinal, or discrete data

• Compare data to a theoretical distribution.• Compare two sets of data

Page 4: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Tests for Goodness of Fit

• Two types – extrinsic and intrinsic• Assumptions of both tests

– Measurement on at least a nominal scale– Observations are independent– The expected frequencies for each category must

be specified– The sample size must be sufficiently large so that

no category has an expected frequency of < 5.

Page 5: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Tests for Goodness of Fit

• Hypotheses– Null – the observed frequency distribution is the

same as the hypothesized frequency distribution– Alternative - the observed and hypothesized

distributions are different

Page 6: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Tests for Goodness of Fit

• Test Statistic– The test statistic is based on the difference between the

observed and expected frequencies. It is calculated by:

2 (O E)2

E

Page 7: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit

• In an extrinsic test, no population parameters need to be estimated from the data.

• An intrinsic test requires an estimation of a population parameter from the data collected.– Technically, the degrees of freedom should be reduced by

1 for each parameter estimated– However, this is a minor effect and not always considered

(we won’t worry about it).– An intrinsic test is commonly used when comparing a

sample to a derived distribution such as the poisson or binomial distribution

Page 8: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Extrinsic)

• Example – Cross of two pea plants with purple flowers. – When you do the cross, you get 80 plants with round

seeds, and 20 with wrinkled.– Your biological hypotheses are that:

• the parents were heterozygous (since some white flowered offspring were produced)

• P is completely dominant to p • genes segregate correctly• fertilization is random• zygotes have the same probability of survival with respect to

this gene.

Page 9: Chi Square Analyses: Comparing Frequency Distributions.

• Example – Your biological hypotheses are that:

• the parents were heterozygous (since some white flowered offspring were produced)

• P is completely dominant to p • genes segregate correctly• fertilization is random• zygotes have the same probability of survival with

respect to this gene.

GAMETES of PARENTS in = Frequency

P p

P PP Pp

p Pp pp

Expected Ratio under THESE hypotheses:

¾ Purple offspring¼ White offspring

Page 10: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Extrinsic)

Offspring Phenotype

OBSERVED EXPECTED by HYPOTHESIS

O-E (O-E)2

Purple 80 75 (3/4) 5 25

White 20 25 (1/4) -5 25

100 100 SUM = 0bummer

SUM = 25Hmmm…

So, we want to see how close our observed results are to what we expect under our hypothesis. Maybe the “total difference” would be a good measure…

But sample size matters….

Page 11: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Extrinsic)

Offspring Phenotype

OBSERVED EXPECTED by HYPOTHESIS

O-E (O-E)2

Purple 7505 7500 (3/4) 5 25

White 2495 2500 (1/4) -5 25

10000 10000 SUM = 0bummer

SUM = 25same

So, we want to see how close our observed results are to what we expect under our hypothesis. Maybe the “total difference” would be a good measure…

But sample size matters….these results are a lot closer to the expected values, but give the same total. So we need to evaluate the “sum of Squares” in relation to sample size… “mean square”

Page 12: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Extrinsic)

Offspring Phenotype

OBSERVED EXPECTED by HYPOTHESIS

O-E (O-E)2 (O-E)2/E

Purple 80 75 (3/4) 5 25 0.33

White 20 25 (1/4) -5 25 1.00

100 100 1.33

So, we want to see how close our observed results are to what we expect under our hypothesis. Maybe the “total difference” would be a good measure…

This = your calculated Chi-Square value, and you compare it to a Chi-Square table with df = Categories (P or W = 2) – 1 = 2-1 = 1.

Page 13: Chi Square Analyses: Comparing Frequency Distributions.

The critical value is associated with a probability; in this case p = 0.05. This is the probability that results as deviant as yours could have occurred by chance if your null hypothesis was true. You only reject the null hypothesis if you observe a more deviant pattern. (This would make your calculated value greater than the threshold critical value).

Page 14: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Intrinsic)

• Example– In the 98 year period from 1900-1997, there were 159

U.S. landfalling hurricanes. Does the number of landfalling hurricanes per year follow a Poisson distribution?

– Calculate the expected frequencies– Calculate the expected number by multiplying the

frequency by the number of categories (here, years = 98)

Formula:

p(x) = Xxe-x

x!

Page 15: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Intrinsic)

Hurricanes Hurricanes per yearper year

Observed #Observed # Expected Expected freqfreq

Expected #Expected #

00 1818 0.1980.198 19.419.4

11 3434 0.3200.320 31.3631.36

22 2424 0.2600.260 25.4825.48

33 1616 0.1400.140 13.7213.72

44 33 0.0570.057 5.595.59

55 11 0.0180.018 1.761.76

66 22 0.0070.007 0.690.69159

Page 16: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Intrinsic)

Hurricanes Hurricanes per yearper year

Observed #Observed # Expected Expected freqfreq

Expected #Expected #

00 1818 0.1980.198 19.419.4

11 3434 0.3200.320 31.3631.36

22 2424 0.2600.260 25.4825.48

33 1616 0.1400.140 13.7213.72

>4>4 66 8.048.04 0.5180.518

Since we had an expected value <5, we combined categories to fix this problem.

Page 17: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test for Goodness of Fit (Intrinsic)

• Calculate the chi square statistic in the same way as before, and look up on table.

• Here:– X2 = 1.306– Tabled value for = 0.05 = 7.81– Thus, we fail to reject the null hypothesis,

supporting the claim that the annual number of landfalling U.S. hurricanes follows a Poisson distribution (rare, independent, random).

Page 18: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test of Independence

• Also called the Chi Square Test for Contingency Tables

• This test is performed to see if two variables, both measured on a nominal scale, are related in some way.

• The question asked here is if there is a relationship between the variables; the null hypothesis is that no relationship exists – they are “independent”.

Page 19: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test of Independence

• Steps in doing the test– 1. Form a table, or matrix, from the data collected– 2. Calculate row, column, and grand totals for the

matrix– 3. Use these totals to calculate expected values

(frequencies) for each cell in the matrix• Calculated by: [(row total) x (column

total)]/grand total• Based on the product rule – the probability of

two independent events occurring together is the product of their independent probabilities.

Page 20: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test of IndependenceClassic Example: Testing for Linkage or Independent Assortment between two loci

Suppose we cross two pea plants: PpTt x pptt

- Purple is completely dominant to white - Tall is completely dominant to short

Produce the following results in the offspring:

PT = 32Pt = 22pT = 23Pt = 36

113

ARE THE GENES ASSORTING INDEPENDENTLY, OR ARE THEY LINKED?

Page 21: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test of IndependenceARE THE GENES ASSORTING INDEPENDENTLY, OR ARE THEY LINKED?

PT = 32Pt = 22pT = 23Pt = 36

113

T t

P 32 22 54

p 23 36 59

55 58 113

CONTINGENCY TABLE

IF these events (flower color and plant height) are inherited independently, THEN the frequency of any combined outcome should be = to the product of their independent probabilities:

IF IA, THEN f(PT) = f(P) x f(T) x N = 54/113 x 55/113 x 113 = 26.28318Reduces to: f(PT) = f(P) x f(T) x N = 54 x 55/113 = 26.28318 = RT x CT/GT

Page 22: Chi Square Analyses: Comparing Frequency Distributions.

Chi Square Test of IndependenceARE THE GENES ASSORTING INDEPENDENTLY, OR ARE THEY LINKED?

PT = 32Pt = 22pT = 23Pt = 36

113

T exp t exp

P 32 26.28 22 27.72 54

p 23 28.72 36 30.28 59

55 58 113

CONTINGENCY TABLE

IF these events (flower color and plant height) are inherited independently, THEN the frequency of any combined outcome should be = to the product of their independent probabilities:

IF IA, THEN f(PT) = f(P) x f(T) x N = 54/113 x 55/113 x 113 = 26.28318Reduces to: f(PT) = f(P) x f(T) x N = 54 x 55/113 = 26.28318 = RT x CT/GT

Page 23: Chi Square Analyses: Comparing Frequency Distributions.

T exp t exp

P 32 26.28 22 27.72 54

p 23 28.72 36 30.28 59

55 58 113

Obs Exp O-E (O-E)2/E

PT 32 26.28 5.72 1.24

Pt 22 27.72 -5.72 1.18

pT 23 28.72 -5.72 1.14

pt 36 30.28 5.72 1.08

4.64

Df = (R-1)(C-1) in contingency table(1)(1) = 1, p = 0.05, critical = 3.84…. Reject Ho.

Page 24: Chi Square Analyses: Comparing Frequency Distributions.