The Chi Square Test.ppt
-
Upload
lavlesh-upadhyay -
Category
Documents
-
view
223 -
download
0
Transcript of The Chi Square Test.ppt
-
7/28/2019 The Chi Square Test.ppt
1/57
The Chi Square Test
2 By SDK, AIM
-
7/28/2019 The Chi Square Test.ppt
2/57
Chi sq: The test of
the goodness of fit
-
7/28/2019 The Chi Square Test.ppt
3/57
The Chi Square Test
The Chi Square Test (2) is used to
determine how well theoretical
distributions (Normal, Binomial, Poisson,
etc) fit empirical distributions (Those
obtained from samples)
Pearson developed this test in 1990 to
check the goodness of fit of distributions
-
7/28/2019 The Chi Square Test.ppt
4/57
Consider A Particular Sample : A set of possible events
E1, E2,, Ek, that are observed to occur withfrequencies o1, o2,, ok (called observed
frequencies). As per the rules of probability,
these events are expected to occur with
frequencies e1, e2,, ek
Event E1 E2 Ek
Observed frequency o1 o2 okExpected frequency e1 e2 ek
-
7/28/2019 The Chi Square Test.ppt
5/57
Example
If we toss a fair coin 100 times, we may
expect 50 heads and 50 tails. However the
results may not be obtained exactly
-
7/28/2019 The Chi Square Test.ppt
6/57
The 2
VariableThe 2 Variable gives a measure of thedisparity existing between theobserved and the expected frequencies
2=i=1k (oi-ei)2/eiN = total frequency =i=1
koi =i=1kei
-
7/28/2019 The Chi Square Test.ppt
7/57
Thus
2 =i=1n[(oi2-2oiei+ei2)/ei]=i=1
n[(oi2/ei) 2N + N]
=i=1n
(oi2
/ei)-N
-
7/28/2019 The Chi Square Test.ppt
8/57
Example
I assume that themarks of a class aredistributed
normally. Howeverwhen theexamination takesplace I realize that
the class has had abetter performance
Marks Observedfrequency
Expectedfrequency
0-20 2 5
21-40 5 10
41-60 18 30
61-80 23 10
81-100 12 5
Total 60 60
-
7/28/2019 The Chi Square Test.ppt
9/57
Marks Observedfrequency
Expectedfrequency
(oi-ei)2/ei
0-20 2 5 1.8
21-40 5 10 2.5
41-60 18 30 4.861-80 23 10 16.9
81-100 12 5 9.8
Total 60 60 35.8 =2
-
7/28/2019 The Chi Square Test.ppt
10/57
-
7/28/2019 The Chi Square Test.ppt
11/57
Also, 20 as it is the sum of squares& the larger the 2 the greater thedifference in the two distributions
-
7/28/2019 The Chi Square Test.ppt
12/57
The probability function of2
-
7/28/2019 The Chi Square Test.ppt
13/57
Calculation of
Populationparameters are notknown and have to
be estimated fromsample statistics
= k 1 m,
m = no of populationparametersestimated
Populationparameters areknown m = 0
= k - 1
-
7/28/2019 The Chi Square Test.ppt
14/57
The 2
Curve
Table value of2
AcceptanceArea Rejection Area
o X
Y
-
7/28/2019 The Chi Square Test.ppt
15/57
Example at = 0.05 and df = 5-1 = 4
Table value of2
-
7/28/2019 The Chi Square Test.ppt
16/57
-
7/28/2019 The Chi Square Test.ppt
17/57
Steps of the Chi Square Test
Define H0 and H1 List the observedfrequencies
Calculate the expectedfrequencies if the datafollows a theoreticaldistribution
Compute2
-
7/28/2019 The Chi Square Test.ppt
18/57
Accept H0 if computed 2 Chi sq comp Accept H0
H0: Data follow the Binomialdistribution
-
7/28/2019 The Chi Square Test.ppt
32/57
-
7/28/2019 The Chi Square Test.ppt
33/57
Chi sq as a test of
Independence
-
7/28/2019 The Chi Square Test.ppt
34/57
Note that :
The tests of significance are allbased on the assumption that the
population is normally distributed.However it is not always possibleto assume the underlying
distribution pattern for thesampling done
-
7/28/2019 The Chi Square Test.ppt
35/57
If we classify a population into
several categories with respectto two attributes (e.g.: age, jobpreference), we can use the Chi
Sq Test to determine if the twoattributes are independent of
each other
-
7/28/2019 The Chi Square Test.ppt
36/57
Example
In 4 regions National Health Company
samples its employees attitudes towards
job performance reviews. Respondents
are given a choice: between the presentmethod of 2 reviews a year and the
proposed method of quarterly reviews.
-
7/28/2019 The Chi Square Test.ppt
37/57
Also,
1. pN is the proportion of employees from the northwho prefer the present plan
2. pE is the proportion of employees from the eastwho prefer the present plan
3. pS is the proportion of employees from the southwho prefer the present plan
4. pW is the proportion of employees from the westwho prefer the present plan
H0: pN = pE = pS = pW
-
7/28/2019 The Chi Square Test.ppt
38/57
Contingency table
North South East West TotalNumber
who
prefer
present
method
68 75 57 79 279
Number
who
prefer
new
method
32 45 33 31 141
Total 100 120 90 110 420
-
7/28/2019 The Chi Square Test.ppt
39/57
Thus combined proportion of employees preferringthe new method = 1 0.6643 = 0.3357
Thus,
1. 0.6643 = Estimate of population proportion who prefer thecurrent method
2. 0.3357 = Estimate of population proportion who prefer thenew method
Multiply the estimate with the total number of employeessampled in each region to get the expected number
-
7/28/2019 The Chi Square Test.ppt
40/57
Contingency table
North South East West Total
Number
who
prefer
present
method
68 75 57 79 279
Number
who
prefer
new
method
32 45 33 31 141
Total 100 120 90 110 420
Observed values
North South East West Total
Number
who
prefer
present
method
66 80 60 73 279
Number
who
prefer
new
method
34 40 30 37 141
Total 100 120 90 110 420
Expected Values
-
7/28/2019 The Chi Square Test.ppt
41/57
-
7/28/2019 The Chi Square Test.ppt
42/57
Thus H0 isaccepted
Degrees of
freedom=(4-1)(2-1)
= 3
-
7/28/2019 The Chi Square Test.ppt
43/57
-
7/28/2019 The Chi Square Test.ppt
44/57
Criteria C1 C2 C3 Total
R1 O11 O12 O13R1
R2 O21 O22 O23 R1
Total C1 C2 C3 n
Consider a contingency table
-
7/28/2019 The Chi Square Test.ppt
45/57
H0: Ri is independent of Cj
Or P(Ri
Cj
) = P(Ri
) P(Cj
)
But P(Ri Cj) = Eij/ n
Also P(Ri) = Ri/ nP(Cj) = Cj/ n
Thus =P(Ri Cj) = P(Ri) P(Cj)
= (Ri/ n)(Cj/ n)Thus Eij = Ri Cj/ n2
-
7/28/2019 The Chi Square Test.ppt
46/57
-
7/28/2019 The Chi Square Test.ppt
47/57
Example
In order to study the profits and losses of
firms by industry, a random sample of 100
firms is selected, and for each form in the
sample, we record whether the company
made ,money or lost money, and whetherthe firm is a service company. The data
are summarized in a 2x2 contingency
table. Is the possibility of making a profitindependent of type of industry?
-
7/28/2019 The Chi Square Test.ppt
48/57
-
7/28/2019 The Chi Square Test.ppt
49/57
-
7/28/2019 The Chi Square Test.ppt
50/57
An insurance companys data regarding
claims gathered by studying three
different age groups of sample size 100each is given below
25 and
under
Over25 and
under
50
50 and
over
Claim 40 35 60No
claim60 65 40
Age group H0: Claim is
not related toage
-
7/28/2019 The Chi Square Test.ppt
51/57
25 and
under
Over
25 and
under
50
50 and
over
Claim 40 35 60 135
No
claim 60 65 40 165100 100 100 300
Age group
Contingency Table(Observed Values)
25 and
under
Over
25 and
under
50
50 and
over
Claim 45 45 45 135
No
claim 55 55 55 165100 100 100 300
Age group
Contingency Table(Expected Values)
-
7/28/2019 The Chi Square Test.ppt
52/57
fo fe (fo-fe)2/fe
40 45 0.56
35 45 2.22
60 45 5.00
60 55 0.45
65 55 1.82
40 55 4.09
Chi sq comp 14.14df 2
Level of
significance0.05
Chi sq
tabulated5.99 Thus reject H0
-
7/28/2019 The Chi Square Test.ppt
53/57
THE MEDIAN TEST
-
7/28/2019 The Chi Square Test.ppt
54/57
Example
An economist wants to
testy the nullhypothesis that medianfamily incomes in threerural areas areapproximately equal.
For simplicity, an equalsample size of 10 ineach population waschosen. The familyincomes are shownalongside
Region A Region B Region C
22 31 28
29 37 42
36 26 21
40 25 4735 20 18
50 43 23
38 27 51
25 41 16
62 57 30
16 32 48
Family incomes $1000 per year
-
7/28/2019 The Chi Square Test.ppt
55/57
Region A Region B Region C
22 31 28
29 37 42
36 26 2140 25 47
35 20 18
50 43 23
38 27 51
25 41 16
62 57 30
16 32 48
Median 31.5
Family incomes $1000 per year
Region A Region B Region C
No of incomes less
than median4 5 6
No of incomes
greater than than
median
6 5 4
Family incomes $1000 per year
-
7/28/2019 The Chi Square Test.ppt
56/57
Region A Region B Region C Total
No of incomes less
than median 4 5 6 15
No of incomes
greater than than
median
6 5 4 15
Total 10 10 10 30
Family incomes $1000 per year
Contingency Table(Observed Values)
Region A Region B Region C
No of incomes less
than median5 5 5 15
No of incomes
greater than than
median
5 5 5 15
Total 10 10 10 30
Contingency Table(Expected Values)Expected value =(10x15)/30
-
7/28/2019 The Chi Square Test.ppt
57/57
fo fe (fo-fe)2/fe
4 5 0.2
5 5 06 5 0.2
6 5 0.2
5 5 0
4 5 0.2
Chi sq comp 0.8
df 2
Level of
significance0.05
Chi sq tabulated 5.99Thus accept H0