Statistics for Management Unit 10
Sikkim Manipal Page No. 1
Unit 10 Chi–Square Test
Structure:10.1 Introduction
Objectives Relevance
10.2 Chi-Square testCharacteristics of Chi-Square testSteps in solving problems related to Chi-Square test Conditions for applying the Chi-Square test Restrictions in applying Chi-Square testPractical applications of Chi-Square test Uses of Chi-Square testDegrees of freedom
Levels of significance Interpretation of Chi-Square values
10.3 Applications of Chi-Square TestTests for independence of attributes Test of goodness of fitTest for comparing variance
10.4 Summary10.5 Glossary10.6 Terminal Questions10.7 Answers10.8 Case Study
10.1Introduction
In the previous unit, testing of hypothesis, we discussed about how to test hypothesis concerned with parameters like mean and proportion, using data from either one or two samples. We used one-sample tests to determine whether a mean or a proportion was significantly different from a hypothesised value. In the two-sample tests, we examined the difference between either two means or two proportions, and we tried to learn whether this difference was significant.
For example, we have proportions from five populations instead of only two, then for these cases, the methods for comparing proportions described for
Statistics for Management Unit 10
Sikkim Manipal Page No. 2
testing hypothesis for two-samples do not apply; we must use the Chi-
Square test (2 test). In this unit, Chi-Square, we will discuss the Chi-Square tests which enable us to test whether more than two population proportions can be considered equal. In other words, a Chi-Square test is also a parametric test which can be applied on categorical data or qualitative data. This test can be applied when we have few or no assumptions about the population parameter.
Actually, Chi-Square tests allow us to do a lot more than just test for the quality of several proportions. If we classify a population into several categories with respect to two attributes (such as age and job performance), we can then use a Chi-Square test to determine whether the two attributes are independent of each other. So, Chi-Square tests can be applied on a contingency table.
Objectives:After studying this unit, you should be able to: describe the non parametric method of testing hypothesis describe the Chi-Square characteristics identify the conditions required for applying Chi-Square test for a given
population distribution recognise the applications of Chi-Square
test describe the steps in solving problems related to Chi-Square
test
10.1.1 Relevance
Case-letWomen still earn less than men
On 27 February 2006 the Women and Work Commission (WWC), published its report on the causes of the “gender pay gap “or the difference between men’s and women‘s hourly pay. According to the report, British women working full-time currently earn 17% less per hour than men. In February the European commission also brought out its own report on the pay gap across the European Union. Its findings were similar in that, on an hourly basis, women earn 15% less than men for the same work.
In the United States, the difference in median pay between men and women is around 20%. According to the WWC report the gender pay gap opens
Statistics for Management Unit 10
Sikkim Manipal Page No. 3
early. Boys and girls study different subjects in school, and boy’s subjects
2
Statistics for Management Unit 10
Sikkim Manipal Page No. 4
lead to more lucrative careers. They then work in different sorts of jobs. As a result, average hourly pay for a woman at the start of her working life is only 91% of a man’s; even through nowadays she is probably better qualified. How do we compile this type of statistical information? We can use Chi- Square testing for more than one type of population.
(Source: Derek L Waller Published by Elsevier Inc Ed 2008).
10.2Chi-Square test
The Chi-square test is one of the most commonly used non-parametric tests
in statistical work. The Greek Letter 2 is used to denote this test. 2
describe the magnitude of discrepancy between the observed and the
expected frequencies. The value of 2 is calculated as:
O E 2
O E 2 O E 2 O E 2 O E 2
2 i i E i
1 1 E1
2 2 E2
3 3 ....... n n
E3 En
Where, O1, O2, O3….On are the observed frequencies and E1, E2, E3…En
are the corresponding expected or theoretical frequencies.
10.2.1 Characteristics of Chi-Square test
The following are the characteristics of a Chi-Square test (2 test):
he 2 test is based on frequencies and not on parameters It is a non-parametric test where no parameters regarding the rigidity of
populations are required
Additive property is also found in 2 test
he 2 test is useful to test the hypothesis about the independence of attributes
The 2 test can be used in complex contingency tables
The 2 test is very widely used for research purposes in behavioral and social sciences including business research
While testing whether the observed frequencies of certain outcomes fits
with expected frequencies defined by a theoretical distribution, the 2
value defined here follows 2 distribution:
2 O i E i
Statistics for Management Unit 10
Sikkim Manipal Page No. 5
E i
Statistics for Management Unit 10
Sikkim Manipal Page No. 6
where, ‘Oi’ is the observed frequency and ‘Ei’ is the expected frequency.
Key StatisticThe observed frequencies are the frequencies obtained from the observation, which are sample frequencies. The expected frequencies are the calculated frequencies.
10.2.2 Steps in solving problems related to Chi-Square testFigure 10.1 depicts the steps required for solving the problems related to Chi-Square test.
Fig. 10.1: Procedural Steps in Solving Problems on Chi-Square Test
10.2.3 Conditions for applying the Chi-Square test
The following are the conditions for using the Chi-Square test:1. The frequencies used in Chi-Square test must be absolute and not in
relative terms.2. The total number of observations collected for this test must be large.
3. Each of the observations which make up the sample of this test must be independent of each other.
4. As 2 test is based wholly on sample data, no assumption is made
Statistics for Management Unit 10
Sikkim Manipal Page No. 7
concerning the population distribution. In other words, it is a non parametric-test.
Statistics for Management Unit 10
Sikkim Manipal Page No. 8
5. 2 test is wholly dependent on degrees of freedom. As the degrees of freedom increase, the Chi-Square distribution curve becomes symmetrical.
6. The expected frequency of any item or cell must not be less than 5, the frequencies of adjacent items or cells should be polled together in order to make it more than 5.
7. The data should be expressed in original units for convenience of comparison and the given distribution should not be replaced by relative frequencies or proportions.
8. This test is used only for drawing inferences through test of the hypothesis, so it cannot be used for estimation of parameter value.
10.2.4 Restrictions in applying Chi-Square testThe sample observations should be independently and normally distributed. For this; either the parent population should be infinitely large (for example, greater than 50), or sampling should be done with replacement.
Constraints imposed upon the observations must be of linear character, for example,
Oi E i
The 2 distribution is essentially a continuous distribution; however its character of continuity is maintained only when the individual frequencies of
the variate values remain greater than or equal to 5. So, in applying 2 test in the testing of the goodness of fit or testing of the dependency of variables in a contingency table, the cell frequency should not be less than 5. In practical problems we can combine a few values of small frequencies into one to get the pooled frequency greater than 5.
Key Statistic
The results of Chi-Square test cannot be accurate if the cell frequencies in a contingency table are less than 5.
10.2.5 Practical applications of Chi-Square test
In inferential statistics, the Chi-Square test can also be applied for the discrete distributions. In using Chi-Square test, we need no assumptions regarding the shape of sampling distributions. The applications of Chi- Square test include testing:
Statistics for Management Unit 10
Sikkim Manipal Page No. 9
the significance of sample variances
the goodness of fit of a theoretical distribution the independence in a contingency table whether the observed results
are consistent with the expected segregations in breeding experiments of genetics
Where the first is a parametric test and the other two are nonparametric test.
10.2.6 Uses of Chi-Square test
The 2 test is used broadly to: Test goodness of fit for one way classification or for one variable
only Test independence or interaction for more than one row or column in the
form of a contingency table concerning several attributes
Test population variance ‘2’ through confidence intervals suggested by
2 test
10.2.7 Degrees of freedom
The number of degrees of freedom for ‘n’ observations is ‘n-k’ and is usually denoted by ‘’, where ‘k’ is the number of independent linear constraints imposed upon them.
Example 1
For example, we are asked to write any four numbers, we will have all the numbers of our choice. If a restriction is applied or imposed to the choice that the sum of these numbers should be 50; then the freedom of choice would be reduced to three only and so the degrees of freedom would now be 3.
If a 2 is defined as the sum of the squares of ‘n’ independent standardized normal variates, and the condition of the satisfaction of one linear relation is imposed upon them (such as the estimation of some population parametric value, etc.), then the effect of these ‘n’ constraints would be replaced by ‘n- k’. If the sum of squares of a sample mean is taken instead of the population mean, then ‘n’ is replaced by n -1 = . This is because one linear constraint has been imposed.
Key Statistic
Statistics for Management Unit 10
Sikkim Manipal Page No.
The Chi-Square distribution has only one parameter, that is, the degrees of freedom.
0
Statistics for Management Unit 10
Sikkim Manipal Page No.
10.2.8 Levels of significanceTables have been prepared for the values of ‘P’, where the probability of
getting a value of 2 2where 0
2 is an observed value. From these
tables, we can find the value of ‘P’ corresponding to an observed value of 2
and then proceed to test, whether the difference between observed and theoretical frequencies is significant or not. Smaller the values of ‘P’, greater the divergence between fact and theory so that small values lead us to suspect the hypothesis. Not only do small values of ‘P’ lead us to suspect the hypothesis but a value of ‘P’ very near to unity may also lead to a similar
result. Thus, if P = 1, 2 = 0, showing that there is a perfect agreement between fact and theory and this is a very improbable event. There are two conventional levels of significance. They are:
If P < 0.05, we say that the observed value of 2 is significant at 5 percent level of significance.
Similarly, if P < 0.01, the value is significant at 1 % level.
10.2.9 Interpretation of Chi-Square values
After ascertaining the 2 value, the 2 table comprises of columns headed with symbols 0.05 for 5% level of significance, 0.01 for 1% level of significance, etc. The left hand side indicates the degrees of freedom. If the
calculated value of 2 falls in the acceptance region, the null hypothesis ‘Ho’ is accepted and vice-versa. Figure 10.2 depicts the acceptance and rejection regions of Chi-Square distribution.
Statistics for Management Unit 10
Sikkim Manipal Page No.
Fig. 10.2: Acceptance and Rejection Regions under Chi-Square Distribution
Sikkim Manipal Page No.
Statistics for Management Unit 10
Key Statistic
The Chi-Square curve will be on the positive side of x-axis because the Chi-Square values are always positive.
10.3Applications of Chi-Square test
10.3.1 Tests for independence of attributes
In the test for independence, the null hypothesis is that the row and column variables are independent of each other. We have studied earlier, that the hypothesis testing is done under the assumption that the null hypothesis is true.
The following are the properties of the test for independence: The data are the observed frequencies The data is arranged in the form of a contingency table
The degrees of freedom ‘’ can be calculated as:
Number
of
rows 1Number
of
columns 1
where, ‘’ is the degrees of freedom The test for independence has a Chi-Square distribution and is always a
right tail test. The expected value is computed by taking the row total, multiplying it
with the column total and dividing by the grand total. That is given by:Row T otalColumn T otal
E Grand T otal
The test statistic value does not change, if the order of the rows or columns is interchanged. Also the value does not change even if the rows and columns are interchanged.
Solved Problem 1
Calculate the degrees of freedom for a contingency table with three rows and two columns.
Solution – The degrees of freedom denoted by ‘’ is calculated as:
Numberof rows 1Number
of
3 12 12
columns 1
Statistics for Management Unit 10
Sikkim Manipal Page No.
Hence, a contingency table with three rows and two columns has two degrees of freedom.
Solved Problem 2Table 10.1 depicts the production in three shifts and the number of defective goods that turned out in three weeks. Test at 5% level of significance whether weeks and shifts are independent.
Table 10.1: Production of Defective Goods in Three Shifts
Shift 1 Week 2 Week 3 Week Total
I 15 5 20 40
II 20 10 20 50
III 25 15 20 60
Total 60 30 60 150
Solution: Table 10.1a depicts the observed and expected values required
to calculate 2.
Table 10.1a: Observed and Expected Values
Observed Value
Oi
Expected Value
R o w T o t a l C o l u m n T o t al E
i
Grand T otal
(O – E )2
i i
O i E i 2
E i
15 (40 x 60) /150 = 16 1 0.0625
20 (50 x 60) /150 = 20 0 0.0000
25 (60 x 60) /150 = 24 1 0.0417
5 (40 x 30) /150 = 8 9 1.1250
10 (50 x 30) /150 = 10 0 0.0000
15 (60 x 30) /150 = 12 9 0.7500
20 (40 x 60) /150 = 16 16 1.0000
20 (50 x 60) /150 = 20 0 0.0000
20 (60 x 60) /150 = 24 16 0.6667
2cal =3.6459
The steps to calculate 2 are described as follows:
1. Null hypothesis ‘Ho’: The week and shifts are independent
Alternate hypothesis ‘H1’: The week and shifts are dependent
2. Level of significance is 5% and degrees of freedom
d.f. = (3 – 1) (3 – 1) = 4
tab2 9.49
2
c
t
Statistics for Management Unit 10
Sikkim Manipal Page No.
3. Test statistics
O E 2 i i
E i
2
cal = 3.6459
4. Conclusion: Since 2
(3.6459) < 2 %0%.49 ), ‘Ho’ is accepted. Hence,
the attributes ‘week’ and ‘shifts’ are independent.
Solved Problem 3
Out of 1000 people surveyed, 600 belonged to urban areas and rest to rural areas. Among 500 who visited other states, 400 belonged to urban areas. Test at 5% level of significance whether area and visiting other states are dependent.
Solution: Table 10.2 depicts the information given in solved problem 3 in a tabulated form.
Table 10.2: People Belonging to Urban and Rural Areas
Other States Urban Rural Total
Visited 400 100 500
Not Visited 200 300 500
Total 600 400 1000
Table 10.2a depicts the observed and expected values for the calculation of 2.
Table 10.2a: Observed and Expected Values
Observed Value
Oi
Expected Value
R o w T o t a l C o l u m n T o t al E
i
Grand T otal
(O – E )2
i i
O E 2
i i
E i
400 300 10000 33.33
200 300 10000 33.33
100 200 10000 50.00300 200 10000 50.00
2cal = 166.66
The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis ‘H0’: Area and visit are independent.
Statistics for Management Unit 10
Sikkim Manipal Page No.
Alternate hypothesis ‘H1’: They are dependent.
2
c t
Statistics for Management Unit 10
Sikkim Manipal Page No.
2. Level of significance is 5% and degrees of
freedom d.f. = (2 – 1) (2 – 1) = 1
tab2 3.84
3. Test statistics
O E 2 i i
E i
2
cal = 166.66
4. Conclusion: Since 2
(166.66) > 2 (3.84), ‘Ho’ is rejected. Hence, the
‘area’ and ‘visit’ are dependent.
10.3.2 Test of goodness of fitThe test of goodness of fit of a statistical model measures how accurately the test fits a set of observations. This test measures and summarises the differences if any, between the observed and expected values of the considered statistical model. These test results are helpful to know whether the samples are drawn from identical distributions or not. The degrees of freedom are ‘n-1’ and the expected value is equal to the average of the observed values.
Solved Problem 4A personal manager is interested in trying to determine whether absenteeism is greater on one day of the week than on another day of the week. The record for the past years is available. Table 10.3a depicts the absenteeism for each working day over a week. Test whether absenteeism is uniformly distributed over the week.
Table 10.3: Comparison of Data about Absenteeism
Days of Week Monday Tuesday Wednesday Thursday Friday
Number of absentees
66 57 54 48 75
Solution: If the absenteeism is uniformly distributed over the week, then expected number of absenteeism per day is given by:
2
c t
Statistics for Management Unit 10
Sikkim Manipal Page No.
i 66 57 54 48
755
60
The table 10.3a depicts the calculated expected values required for
calculation of 2 for the data related to problem 4.
Table 10.3a: Observed and Expected Values for Calculation of 2
Observed Value OiExpected Value
E i
(O – E )2
i i
O i E i 2E i
66 60 36 0.6000
57 60 9 0.1500
54 60 36 0.6000
48 60 144 2.4000
75 60 225 3.7500
2cal=7.5000
The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis ‘Ho’: The observed frequencies fit with uniform distribution.
2. Alternate hypothesis ‘H1’: The observed frequencies does not fit with uniform distribution.
3. Level of significance is 5% and degrees of freedom (d.f.)= (5 – 1) = 4
2 tab 9.49
4. Test statistics
2 O i E i E i
2
cal = 7.50
5. Conclusion: Since 2 (7.5) < 2 %0%.49 ), ‘Ho’ is accepted. In other
words, we conclude at 5% level of significance that absenteeism is uniformly distributed and is independent of the days of the week.
2
c t
Statistics for Management Unit 10
Sikkim Manipal Page No.
Solved Problem 5According to a theory in Genetics, the proportion of beans of A, B, C and D types in a generation should be 9:3:3:1. In an experiment with 1600 beans, the frequency of bean of A, B, C and D type was observed to be 882, 313, 287 and 118 respectively. Does the result support the theory?
Solution: The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis ‘Ho’: The result supports theory
Alternate hypothesis ‘H1’: The result does not support theory
2. Level of significance is 5% and degrees of freedom(d.f.)= (4 – 1) = 3
3. Test statistics
tab2 7.81
2 O i E i E i
Table 10.4 depicts the observed and expected values for calculation of 2
for solved problem 5.
Table 10.4: Observed and Expected Values for Calculation of 2
Observed Value OiExpected Value
E i
(O – E )2
i i
O i E i 2E i
882 (1600 x 9) / 16 = 900 324 0.36
313 (1600 x 3) / 16 = 300 169 0.56
287 (1600 x 3) / 16 = 300 169 0.56
118 (1600 x 1) / 16 = 100 324 3.24
2cal = 4.72
cal = 4.72
4. Conclusion: Since 2 (4.72) < 2 %0%.81 ), ‘Ho’ is accepted. Therefore,
the result supports the theory.
2
Statistics for Management Unit 10
Sikkim Manipal Page No.
Solved problem 6The following table gives the classification of 100 workers according to gender and the nature of work. Test whether nature of work is independent of the gender of the worker.
Table 10.5
Skilled Unskilled Total
Males 40 20 60
Females 10 30 40
Total 50 50 100
The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis ‘Ho’: There is no association between nature of work and is independent of the gender of the worker
2. Level of significance is 5% and degrees of freedom(d.f.)=
(r-1)(c-1)= (2-1) (2-1)=1
tab2 3.84
3. Test statistics
O E 2 i i
E i
Table 10.5a depicts the observed and expected values for calculation of 2
for solved problem 6.
Table 10.5a: Observed and Expected Values for Calculation of 2
Observed Value OiExpected Value
E i
(O – E )2
i i
O i E i 2E i
40 30 10 3.333
10 20 -10 5.000
20 30 -10 3.333
30 20 10 5.000
2cal = 16.666
cal = 16.666
c
t
s p p
p
2
Statistics for Management Unit 10
Sikkim Manipal Page No.
4. Conclusion: Since 2
(16.666) > 2 %0%.84 ), ‘Ho’ is accepted. Therefore
the null hypothesis that gender and nature of work are independent will be rejected.
10.3.3 Test for comparing variance
When we have to use 2 as a test of population variance, then,
Ho: 2
= 2 and HA: s2
22
2 s
p2
(n 1)
Where s = variance of the sample
2= variance of the population(n -1) = degrees of freedom, n being the number of items in the
sample.
Then by comparing the calculated value with the table value of 2 for (n-1)
degrees of freedom at a given level of significance, we may either accept or
reject the null hypothesis. If the calculated of 2 is less than the table value,
the null hypothesis is accepted, but if the calculated value is equal or greater than the table value the hypothesis is rejected.
Self Assessment Questions
1. 2 – test is a test.2. A table with 4 rows and 2 columns has the degrees of freedom of
.
3. 2 – test is wholly based on data.
4. If there are four rows and five columns in classification for 2 – test, then the number of degrees of freedom equal to .
5. If the calculated 2 value is less than the tabulated 2 value, then the
null hypothesis is .
i) 100.0ii) 38.4
iii) 0.61iv) -
2.45
i) 5ii) 6iii) 7iv) 12
Statistics for Management Unit 10
Sikkim Manipal Page No.
Activity
Objective Questions:
1. What is the appropriate test to use if you want to determine whether there is evidence that the proportion of successes is higher in group 1 than in group 2 and we have obtained independent samples from the two groups?
i) The Z testii) The Chi-Square testiii) Both of the aboveiv) None of the above
2. Which of the following values cannot occur in a Chi-Square distribution?
3. What test would you use to determine whether a set of observed frequencies differ from their corresponding expected frequencies?
i) The t test for dependent samplesii) The Chi-Square testiii) The t test for independent samplesiv) The F test
4. When using the chi-square test for differences in two proportions with a contingency table that has r rows and c columns, how many degrees of freedom will the test statistic have?
i) n – 1ii) n
1+ n - 2
2
iii) (r - 1) x (c - 1)iv) (r - 1) + (c – 1)
5. When testing for the independence in a contingency table with 3 rows and 4 columns, how many the degrees of freedom will the test statistic have?
Statistics for Management Unit 10
Sikkim Manipal Page No.
6. Which of the following is true about the Chi-Square distribution?i) It is a skewed distribution
ii) Its shape depends on the number of degrees of freedom
iii) As the degrees of freedom increase, the Chi-Square distribution becomes more symmetrical
iv) All of the above
7. What other name is used for a contingency table?i) A cross-classification tableii) An ANOVA tableiii) A histogramiv) None of the above
Solutions to Objective Questions1. i) The Z test
2. iv) -2.45
3. ii) The Chi-Square
test 4. iii) (r - 1)x(c – 1)
5. ii) 6
6. 8 iv) All of the above
7. i) A cross-classification table
10.4Summary
Let us recapitulate the important concepts discussed in this unit: Chi-Square test is a non-parametric test. The important applications of
Chi-Square test are the tests for independence of attributes, the test of goodness of fit and the test for specified variance.
2 describe the magnitude of discrepancy between the observed and the
expected frequencies. The value of 2 is calculated as:
O E 2 O E 2 O E 2 O E 2 O E 2
2 i i E i
1 1 E1
2 2 E2
3 3 ....... n n
E3 En
Where, O1, O2, O3….On are the observed frequencies and E1, E2, E3…En are the corresponding expected or theoretical frequencies..
Sikkim Manipal Page No.
Statistics for Management Unit 10
An important criterion for applying the Chi-Square test is that the sample size should be very large.
10.5Glossary
Chi-Square test: It is a non-parametric test where no parameters regarding the rigidity of population are required.
Level of significance: The smallest probability at which the null hypothesis would be rejected (type I error). Usually, if the significance level is less than a number such as 0.05 (5%), the null hypothesis would be rejected in favour of the alternative; the chance of getting a sample like the one being analysed if the null hypothesis were true. A small significance level would imply that getting such a sample was highly unlikely, suggesting that the null hypothesis is probably not true; also called the P-value of the test.
10.6Terminal Questions
5. 400 items of each (material) were given treatment ‘x’ and ‘y’ to enhance the strength of the material. 80 gained strength by treatment ‘x’ and 20 gained strength by treatment ‘y’. Does the gain in strength depend on the treatment?
6. The demand for a particular spare part was found to vary from day to
day. Table 10.6 depicts the information obtained in a sample study. Test the hypothesis that the number demanded depends upon the day.
Table 10.6: Spare Part Demand from Monday to Saturday
Days Mon Tue Wed Thur Fri Sat
Quantity Demanded
1124 1125 1110 1120 1126 1115
7. In a survey of 200 boys, of which 75 were intelligent, 40 had skilled fathers. While 85 of the unintelligent boys had unskilled fathers. Can we say on the basis of the information that skilled fathers had intelligent boys?
8. The number of car accidents per month in a town was as follows: 6, 9, 4, 12, 8, 20, 14, 15, 2, and 10. Test the hypothesis that the number of accidents is same every month.
1.
2.
3.
4.
5.
6.
Statistics for Management Unit 10
Sikkim Manipal Page No.
9. In a particular industry the post graduate, graduate, undergraduates are in the ratio 2:3:5. A firm belonging to the industry had 400, 550 and 1050 postgraduates, graduates and undergraduates on its pay-roll. Do they follow earlier observation about the industry?
10. Three hundred digits were chosen at random from a set of tables. The frequencies of the digits were as follows:
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 28 29 33 31 26 35 32 30 31 25
Using Chi-square test assess the hypothesis that the digits were distributed in equal numbers in the table.
10.7Answers
Self Assessment Questions 1. Non-parametric2. 33. Sample4. 125. Not Rejected
Terminal Questions
2cal
2cal
2cal
2cal
2cal
2cal
= 41.142
Ho
= 0.179
Ho
= 8.888
Ho
= 26.6
Ho
= 6.6667
Ho
= 2.864
Ho
Statistics for Management Unit 10
Sikkim Manipal Page No.
r
e
j
e
c
t
e
d
accepted
rejected
rejected
rejected
accepted
10.8Case Study
Automobile PreferenceA market research firm in an Asian country made a survey to see if there was any correlation between a person’s nationality and their preference in the make of automobile they purchased. Table 10.7 depicts the sample information obtained.
Statistics for Management Unit 10
Sikkim Manipal Page No.
Table 10.7: Types of Automobile Purchased in Various Countries
Pakistan China India Srilanka Nepal
Maruti Suzuki 40 28 30 25 50
Opel 32 35 29 39 35
Lancer 24 40 27 28 29
Ford 40 20 40 26 40
Fiat 26 10 35 35 46
Discussion Questions:
i. Indicate the appropriate null and alternative hypothesis to test if the make of automobile purchased is dependent on an individual’s nationality?
ii. Using the critical value approach of the Chi-Square test at a 1% significant level, does it appear that there is a relationship between automobile purchase and nationality?
iii. Verify the result to Question 2 by using the p-value approach of the Chi-Square test
iv. What has to be the significance level in order that there appears a breakeven situation between dependency of nationality and automobile preference?
v. What is your comment about the results?
References: Bevington, P. R. & Robinson, D. K. Data Reduction and Error Analysis
for the Physical Sciences (3rd Edition). (Paperback). Cowan, G. Statistical Data Analysis (Oxford Science Publications).
(Paperback). Devore, J. L. Probability and Statistics for Engineering and the Sciences
Enhanced Review Edition. (Hardcover - Jan. 29, 2008). Froedesen, A. G., Skieggestad, D. & Tofte, H. Probability and Statistics
in Particle Physics. (Hardcover, 1979 – out of print).
James. H. Statistical Methods in Experimental Physics (2nd Edition). (Hardcover - Nov. 29, 2006).
Levin, R. I. & Rubin, D. S. (2008) Statistics for Management, Seventh Edition, PHI Learning Private Limited.
Lyons, L. Nuclear and Particle Physicists. (Paperback, 1989).
Statistics for Management Unit 10
Sikkim Manipal Page No.
Mandel, J. The Statistical Analysis of Experimental Data. (Paperback).
Mayer, S. L. Data Analysis for Scientists and Engineers. (Paperback).
Morris. H., Schervish, M. J. & Degroot Probability and Statistics [PROBABILITY & STATISTICS 3 -OS]. (Paperback - Jan. 31, 2002).
Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.Numerical Recipes (3rd Edition): The Art of Scientific Computing.
Ross, S. M. Introduction to Probability and Statistics for Engineers and Scientists, Fourth Edition. (Hardcover - Feb. 13, 2009).
Taylor, J. R. An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements. (Paperback).