381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

15
38 1 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

description

381 Goodness-of-fit Tests A is used to test whether an observed frequency distribution fits an expected distribution. We need to specify a null and an alternative hypothesis. Generally the null hypothesis is that the observed frequency distribution (the data) fits the expected distribution. The alternative hypothesis is that this is not the case.

Transcript of 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

Page 1: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Goodness of Fit Tests

QSCI 381 – Lecture 40(Larson and Farber, Sect 10.1)

Page 2: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Multinomial Experiments A is a

probability experiment consisting of a fixed number of trials in which there are more than two possible outcomes for each independent trial. The probability for each outcome is fixed and each outcome is classified into .

Examples of multinomial experiments include: You sample 100 animals from a population. The

categories could be age, length, maturity state. You sample 1000 poppies in a field. The categories

could be colour. You sample 20 animals and calculate the frequency

that each has a particular genetic haplotype.

Page 3: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Goodness-of-fit Tests A is used

to test whether an observed frequency distribution fits an expected distribution.

We need to specify a null and an alternative hypothesis. Generally the null hypothesis is that the observed frequency distribution (the data) fits the expected distribution. The alternative hypothesis is that this is not the case.

Page 4: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Example-I We expect that a “healthy” marine mammal

population should consist of an equal number of males and females, and that 60% of the population should be mature. We sample 150 animals and assess the fraction in each of four categories to be:

MatureFemale

MatureMale

ImmatureFemale

ImmatureMale

30 40 32 48

Page 5: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Observed and Expected Frequencies

The of a category is the frequency for the category observed in the data.

The of a category is the calculated frequency for the category. Expected frequencies are obtained by assuming the specified (or hypothesized) distribution is correct. The expected frequency for the i th category is:

Where n is the number of trials, and pi is the assumed probability for the i th category.

i iE n p

Page 6: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Observed and Expected Frequencies

(Example)

MatureFemale

MatureMale

ImmatureFemale

ImmatureMale

Observed frequency

30 40 32 48

Assumed probability

0.3 0.3 0.2 0.2

Expected frequency

45(150 x 0.3)

45(150 x 0.3)

30(150 x 0.2)

30(150 x 0.2)

Page 7: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

The Chi-square goodness-of-fit Test-I

IF:1. the observed frequencies are obtained from

a random sample, and2. the expected frequencies are greater than

or equal to 5 (pool categories if this is not the case).

then the sampling distribution for the goodness-of-fit test is a chi-square distribution with k-1 degrees of freedom where k is the number of categories. The test statistic is:2

2 ( )i i

i i

O EE

Page 8: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

The Chi-square goodness-of-fit Test-II

1. Identify the claim and state the null and alternative hypotheses.

2. Specify the level of significance, .3. Determine the degrees of freedom,

d.f=k-1.4. Find the critical value of the chi-square

distribution and hence define the rejection region for the test.

5. Calculate the test statistic.6. Check whether or not the value of the

test statistic is in the rejection region.

Page 9: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Example (Test using =0.01)

H0: the distribution of animals between sex and maturity classes equals that expected for a healthy population.

The degrees of freedom=k-1=3. The critical value of the chi-square

distribution is 11.34 (CHIINV(0.01,3))

Page 10: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Example (Test using =0.01)

MatureFemale

MatureMale

ImmatureFemale

ImmatureMale

Observed frequency

30 40 32 48

Expected frequency

45 45 30 30

5 0.56 0.13 10.802( )i i

i

E OE

2( ) 16.49i i

i i

E OE

-We reject the null hypothesis at the1% level of significance.

Page 11: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Example-A-1 (=0.05) The probability of a particular bird species

utilizing each of five habitats is known. We collect data for a different species (n=137) and wish to assess whether the two species differ in their habitat requirements.

Habitat type1 2 3 4 5

Expected p

0.2 0.1 0.05 0.5 0.15

Observed 30 17 0 72 18

Page 12: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Example-A-2 (=0.05)Habitat type

1 2 3 4 5Observed frequency

30 17 0 72 18

Expected frequency

27.4 13.7 6.85 68.5 20.55

0.25 0.79 6.85 0.18 0.322( )i i

i

E OE

2( ) 8.37i i

i i

E OE

The critical value is 9.49 – we fail to reject the null hypothesis

Page 13: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Testing for Normality We can use the chi-square test in

some cases to assess whether a variable is normally distributed.

The null and alternative hypotheses are that: The variable has a normal distribution. The variable does not have a normal

distribution.

Page 14: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

ExampleClass

boundaries

Frequency

5-15 615-25 2325-35 5335-45 4545-55 22

Can we assume that these data are normal (assume =0.05)?

Page 15: 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)

381

Calculating the Test Statistic

Class boundarie

s

Observed

frequencyO

Cumulative normal

Expectedp

ExpectedFrequency

E

Lower Upper Difference

5-15 6 0.0030 0.0368 0.0338 5.0 0.182215-25 23 0.0368 0.2037 0.1669 24.9 0.140725-35 53 0.2037 0.5526 0.3488 52.0 0.020235-45 45 0.5526 0.8627 0.3102 46.2 0.031845-55 22 0.8637 0.9800 0.1172 17.5 1.1746

149 0.977 145.57 1.5497

/ 33.62i ii

f x n

211 ( ) 10.42i ini

f x

xi is the mid-point of each class

2( )i i

i

E OE

Ei=pi x 149