381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)
-
Upload
audrey-sparks -
Category
Documents
-
view
221 -
download
0
description
Transcript of 381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)
381
Goodness of Fit Tests
QSCI 381 – Lecture 40(Larson and Farber, Sect 10.1)
381
Multinomial Experiments A is a
probability experiment consisting of a fixed number of trials in which there are more than two possible outcomes for each independent trial. The probability for each outcome is fixed and each outcome is classified into .
Examples of multinomial experiments include: You sample 100 animals from a population. The
categories could be age, length, maturity state. You sample 1000 poppies in a field. The categories
could be colour. You sample 20 animals and calculate the frequency
that each has a particular genetic haplotype.
381
Goodness-of-fit Tests A is used
to test whether an observed frequency distribution fits an expected distribution.
We need to specify a null and an alternative hypothesis. Generally the null hypothesis is that the observed frequency distribution (the data) fits the expected distribution. The alternative hypothesis is that this is not the case.
381
Example-I We expect that a “healthy” marine mammal
population should consist of an equal number of males and females, and that 60% of the population should be mature. We sample 150 animals and assess the fraction in each of four categories to be:
MatureFemale
MatureMale
ImmatureFemale
ImmatureMale
30 40 32 48
381
Observed and Expected Frequencies
The of a category is the frequency for the category observed in the data.
The of a category is the calculated frequency for the category. Expected frequencies are obtained by assuming the specified (or hypothesized) distribution is correct. The expected frequency for the i th category is:
Where n is the number of trials, and pi is the assumed probability for the i th category.
i iE n p
381
Observed and Expected Frequencies
(Example)
MatureFemale
MatureMale
ImmatureFemale
ImmatureMale
Observed frequency
30 40 32 48
Assumed probability
0.3 0.3 0.2 0.2
Expected frequency
45(150 x 0.3)
45(150 x 0.3)
30(150 x 0.2)
30(150 x 0.2)
381
The Chi-square goodness-of-fit Test-I
IF:1. the observed frequencies are obtained from
a random sample, and2. the expected frequencies are greater than
or equal to 5 (pool categories if this is not the case).
then the sampling distribution for the goodness-of-fit test is a chi-square distribution with k-1 degrees of freedom where k is the number of categories. The test statistic is:2
2 ( )i i
i i
O EE
381
The Chi-square goodness-of-fit Test-II
1. Identify the claim and state the null and alternative hypotheses.
2. Specify the level of significance, .3. Determine the degrees of freedom,
d.f=k-1.4. Find the critical value of the chi-square
distribution and hence define the rejection region for the test.
5. Calculate the test statistic.6. Check whether or not the value of the
test statistic is in the rejection region.
381
Example (Test using =0.01)
H0: the distribution of animals between sex and maturity classes equals that expected for a healthy population.
The degrees of freedom=k-1=3. The critical value of the chi-square
distribution is 11.34 (CHIINV(0.01,3))
381
Example (Test using =0.01)
MatureFemale
MatureMale
ImmatureFemale
ImmatureMale
Observed frequency
30 40 32 48
Expected frequency
45 45 30 30
5 0.56 0.13 10.802( )i i
i
E OE
2( ) 16.49i i
i i
E OE
-We reject the null hypothesis at the1% level of significance.
381
Example-A-1 (=0.05) The probability of a particular bird species
utilizing each of five habitats is known. We collect data for a different species (n=137) and wish to assess whether the two species differ in their habitat requirements.
Habitat type1 2 3 4 5
Expected p
0.2 0.1 0.05 0.5 0.15
Observed 30 17 0 72 18
381
Example-A-2 (=0.05)Habitat type
1 2 3 4 5Observed frequency
30 17 0 72 18
Expected frequency
27.4 13.7 6.85 68.5 20.55
0.25 0.79 6.85 0.18 0.322( )i i
i
E OE
2( ) 8.37i i
i i
E OE
The critical value is 9.49 – we fail to reject the null hypothesis
381
Testing for Normality We can use the chi-square test in
some cases to assess whether a variable is normally distributed.
The null and alternative hypotheses are that: The variable has a normal distribution. The variable does not have a normal
distribution.
381
ExampleClass
boundaries
Frequency
5-15 615-25 2325-35 5335-45 4545-55 22
Can we assume that these data are normal (assume =0.05)?
381
Calculating the Test Statistic
Class boundarie
s
Observed
frequencyO
Cumulative normal
Expectedp
ExpectedFrequency
E
Lower Upper Difference
5-15 6 0.0030 0.0368 0.0338 5.0 0.182215-25 23 0.0368 0.2037 0.1669 24.9 0.140725-35 53 0.2037 0.5526 0.3488 52.0 0.020235-45 45 0.5526 0.8627 0.3102 46.2 0.031845-55 22 0.8637 0.9800 0.1172 17.5 1.1746
149 0.977 145.57 1.5497
/ 33.62i ii
f x n
211 ( ) 10.42i ini
f x
xi is the mid-point of each class
2( )i i
i
E OE
Ei=pi x 149