Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The...

13
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. Chi Chi - - Square Tests Square Tests Chapter 15 15 15 Chi-Square Test for Independence Chi-Square Tests for Goodness- of-Fit Uniform Goodness-of-Fit Test Poisson Goodness-of-Fit Test Normal Chi-Square Goodness- of-Fit Test ECDF Tests (Optional) 15-2 Chi Chi - - Square Test for Independence Square Test for Independence A A contingency table contingency table is a cross is a cross - - tabulation of tabulation of n n paired observations into categories. paired observations into categories. Each cell shows the count of observations that Each cell shows the count of observations that fall into the fall into the category category defined by its defined by its row ( row ( r r ) and ) and column ( column ( c c ) ) heading. heading. Contingency Tables Contingency Tables A B 15-3 Contingency Tables Contingency Tables Chi Chi - - Square Test for Independence Square Test for Independence For example: For example: Table 15.1 15-4 Chi Chi - - Square Test for Independence Square Test for Independence Chi Chi - - Square Test Square Test In a test of independence for an In a test of independence for an r r x x c c contingency table, the hypotheses are contingency table, the hypotheses are H H 0 0 : Variable : Variable A A is independent of variable is independent of variable B B H H 1 1 : Variable : Variable A A is not independent of variable is not independent of variable B B Use the Use the chi chi - - square test for independence square test for independence to to test these hypotheses. test these hypotheses. This This non non - - parametric parametric test is based on test is based on frequencies frequencies . . The The n n data pairs are classified into data pairs are classified into c c columns columns and and r r rows and then the rows and then the observed frequency observed frequency f f jk jk is compared with the is compared with the expected frequency expected frequency e e jk jk . .

Transcript of Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The...

Page 1: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc.

ChiChi--Square TestsSquare Tests

Chapter151515

Chi-Square Test for Independence

Chi-Square Tests for Goodness-of-Fit

Uniform Goodness-of-Fit TestPoisson Goodness-of-Fit TestNormal Chi-Square Goodness-

of-Fit TestECDF Tests (Optional)

15-2

ChiChi--Square Test for IndependenceSquare Test for Independence

•• A A contingency tablecontingency table is a crossis a cross--tabulation of tabulation of nnpaired observations into categories.paired observations into categories.

•• Each cell shows the count of observations that Each cell shows the count of observations that fall into the fall into the category category defined by its defined by its row (row (rr) and ) and column (column (cc))heading.heading.

Contingency TablesContingency Tables

A

B

15-3

Contingency TablesContingency Tables

ChiChi--Square Test for IndependenceSquare Test for Independence

•• For example:For example:

Table 15.1

15-4

ChiChi--Square Test for IndependenceSquare Test for Independence

ChiChi--Square TestSquare Test•• In a test of independence for an In a test of independence for an rr x x cc

contingency table, the hypotheses arecontingency table, the hypotheses areHH00: Variable : Variable AA is independent of variable is independent of variable BBHH11: Variable : Variable AA is not independent of variable is not independent of variable BB

•• Use the Use the chichi--square test for independencesquare test for independence to to test these hypotheses.test these hypotheses.

•• This This nonnon--parametric parametric test is based on test is based on frequenciesfrequencies..

•• The The nn data pairs are classified into data pairs are classified into cc columns columns and and rr rows and then the rows and then the observed frequencyobserved frequency ffjkjkis compared with the is compared with the expected frequencyexpected frequency eejkjk..

Page 2: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-5

ChiChi--Square Test for IndependenceSquare Test for Independence

•• The critical value comes from the The critical value comes from the chichi--square square probability distributionprobability distribution with with νν degrees of degrees of freedom.freedom.

νν = degrees of freedom = (= degrees of freedom = (rr –– 1)(1)(cc –– 1)1)where where rr = number of rows in the table= number of rows in the table

cc = number of columns in the table= number of columns in the table•• Appendix E contains critical values for rightAppendix E contains critical values for right--

tail areas of the chitail areas of the chi--square distribution.square distribution.•• The mean of a chiThe mean of a chi--square distribution is square distribution is νν with with

variance 2variance 2νν..

ChiChi--Square DistributionSquare Distribution

15-6

ChiChi--Square Test for IndependenceSquare Test for Independence

•• Consider the shape of the chiConsider the shape of the chi--square square distribution:distribution:

ChiChi--Square DistributionSquare Distribution

Figure 15.1

15-7

ChiChi--Square Test for IndependenceSquare Test for Independence

•• Assuming that Assuming that HH00 is true, the expected is true, the expected frequency of row frequency of row jj and column and column kk is:is:

eejkjk = = RRjjCCkk//nnwhere where RRjj = total for row = total for row jj ((jj = 1, 2, = 1, 2, ……, , rr))

CCkk = total for column = total for column kk ((kk = 1, 2, = 1, 2, ……, , cc))nn = sample size= sample size

Expected FrequenciesExpected Frequencies

15-8

ChiChi--Square Test for IndependenceSquare Test for Independence

•• Step 1: State the HypothesesStep 1: State the HypothesesHH00: Variable : Variable AA is independent of variable is independent of variable B B HH11: Variable : Variable AA is not independent of variable is not independent of variable BB

•• Step 2: Specify the Decision RuleStep 2: Specify the Decision RuleCalculate Calculate νν = (= (rr –– 1)(1)(cc –– 1) 1) For a given For a given αα, look up the right, look up the right--tail critical tail critical value (value (χχ22

RR) from Appendix E or by using Excel.) from Appendix E or by using Excel.Reject Reject HH00 if if χχ22

RR > test statistic. > test statistic.

Steps in Testing the HypothesesSteps in Testing the Hypotheses

Page 3: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-9

ChiChi--Square Test for IndependenceSquare Test for Independence

•• For example, for For example, for νν = 6 and = 6 and αα = .05, = .05, χχ22.05.05 = 12.59.= 12.59.

Steps in Testing the HypothesesSteps in Testing the Hypotheses

Figure 15.2

15-10

ChiChi--Square Test for IndependenceSquare Test for Independence

•• Here is the rejection region.Here is the rejection region.Steps in Testing the HypothesesSteps in Testing the Hypotheses

Figure 15.3

15-11

ChiChi--Square Test for IndependenceSquare Test for Independence

•• Step 3: Calculate the Expected FrequenciesStep 3: Calculate the Expected Frequencieseejkjk = = RRjjCCkk//nn

•• For example, For example,

Steps in Testing the HypothesesSteps in Testing the Hypotheses

15-12

ChiChi--Square Test for IndependenceSquare Test for Independence

•• Step 4: Calculate the Test StatisticStep 4: Calculate the Test StatisticThe chiThe chi--square test statistic issquare test statistic is

•• Step 5: Make the DecisionStep 5: Make the DecisionReject Reject HH00 if if χχ22

RR > test statistic or if the > test statistic or if the pp--value value << αα..

Steps in Testing the HypothesesSteps in Testing the Hypotheses

calc

Page 4: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-13

ChiChi--Square Test for IndependenceSquare Test for Independence

•• The chiThe chi--square test is unreliable if the square test is unreliable if the expectedexpected frequencies are too small.frequencies are too small.

•• Rules of thumb:Rules of thumb:•• CochranCochran’’s Rules Rule requires that requires that eejkjk > 5 for all > 5 for all cells.cells.•• Up to 20% of the cells may have Up to 20% of the cells may have eejkjk < 5 < 5

Small Expected FrequenciesSmall Expected Frequencies

•• Most agree that a chiMost agree that a chi--square test is infeasible square test is infeasible if if eejkjk < 1 in any cell.< 1 in any cell.

•• If this happens, try combining adjacent rows or If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.columns to enlarge the expected frequencies.

15-14

ChiChi--Square Test for IndependenceSquare Test for Independence

•• ChiChi--square tests for independence can also be square tests for independence can also be used to analyze quantitative variables by used to analyze quantitative variables by coding them into categories.coding them into categories.

CrossCross--Tabulating Raw DataTabulating Raw Data

For example, the variables Infant Deaths per 1,000and Doctors per 100,000 can each be coded into various categories:

Figure 15.6

15-15

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

Why Do a ChiWhy Do a Chi--Square Test on Numerical Square Test on Numerical Data?Data?

•• The researcher may believe thereThe researcher may believe there’’s a s a relationship between X and Y, but doesnrelationship between X and Y, but doesn’’t t want to use regression.want to use regression.

•• There are outliers or anomalies that prevent There are outliers or anomalies that prevent us from assuming that the data came from us from assuming that the data came from a normal population.a normal population.

•• The researcher has numerical data for one The researcher has numerical data for one variable but not the other.variable but not the other.

15-16

ChiChi--Square Test for IndependenceSquare Test for Independence

•• More than two variables can be compared More than two variables can be compared using contingency tables.using contingency tables.

•• However, it is difficult to visualize a higher However, it is difficult to visualize a higher order table.order table.

•• For example, you could visualize a For example, you could visualize a cubecube as as a stack of tiled 2a stack of tiled 2--way contingency tables.way contingency tables.

•• Major computer packages permit 3Major computer packages permit 3--way way tables.tables.

33--Way Tables and HigherWay Tables and Higher

Page 5: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-17

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

Purpose of the TestPurpose of the Test•• The The goodnessgoodness--ofof--fitfit ((GOFGOF) test helps you ) test helps you

decide whether your sample resembles a decide whether your sample resembles a particular kind of population.particular kind of population.

•• The chiThe chi--square test will be used because square test will be used because it is versatile and easy to understand.it is versatile and easy to understand.

15-18

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

•• A A multinomial distributionmultinomial distribution is defined by any is defined by any kkprobabilities probabilities ππ11, , ππ22, , ……, , ππkk that sum to unity.that sum to unity.

•• For example, consider the following For example, consider the following ““officialofficial””proportions of M&M colors.proportions of M&M colors.

Multinomial GOF TestMultinomial GOF Test

calc

15-19

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

•• The hypotheses areThe hypotheses areHH00: : ππ11 = .30, = .30, ππ22 = .20, = .20, ππ33 = .10, = .10, ππ44 = .10, = .10, ππ55 = .10, = .10, ππ66 = .20= .20HH11: At least one of the : At least one of the ππjj differs from the differs from the

hypothesized valuehypothesized value•• No parameters are estimated (No parameters are estimated (mm = 0) and there = 0) and there

are are cc = 6 classes, so the degrees of freedom are= 6 classes, so the degrees of freedom areνν = = cc –– mm –– 1 = 6 1 = 6 –– 0 0 -- 11

Multinomial GOF TestMultinomial GOF Test

15-20

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

Hypotheses for GOFHypotheses for GOF•• The hypotheses are:The hypotheses are:

HH00: The population follows a _____ distribution: The population follows a _____ distributionHH11: The population does not follow a ______ : The population does not follow a ______

distributiondistribution•• The blank may contain the name of any The blank may contain the name of any

theoretical distribution (e.g., uniform, Poisson, theoretical distribution (e.g., uniform, Poisson, normal).normal).

Page 6: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-21

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

•• Assuming Assuming nn observations, the observations observations, the observations are grouped into are grouped into cc classes and then the classes and then the chichi--square test statisticsquare test statistic is found using:is found using:

Test Statistic and Degrees of Freedom for Test Statistic and Degrees of Freedom for GOFGOF

wherewhere ffjj = the observed frequency of = the observed frequency of observations in class observations in class jj

eejj = the expected frequency in class = the expected frequency in class jj if if HH00 were truewere true

calc

15-22

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

•• If the proposed distribution gives a good fit If the proposed distribution gives a good fit to the sample, the test statistic will be near to the sample, the test statistic will be near zero.zero.

•• The test statistic follows the chiThe test statistic follows the chi--square square distribution with degrees of freedomdistribution with degrees of freedom

νν = = cc –– mm –– 1 1 wherewhere cc is the no. of classes used in the is the no. of classes used in the test test mm is the no. of parameters estimatedis the no. of parameters estimated

Test Statistic and Degrees of Freedom for Test Statistic and Degrees of Freedom for GOFGOF

15-23

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

Test Statistic and Degrees of Freedom for Test Statistic and Degrees of Freedom for GOFGOF

110 −=−−=−= ccmcv

211 −=−−=−= ccmcv

312 −=−−=−= ccmcv

15-24

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

•• Instead of Instead of ““fishingfishing”” for a goodfor a good--fitting fitting model, visualize model, visualize a prioria priori the characteristics the characteristics of the underlying of the underlying datadata--generating processgenerating process..

DataData--Generating SituationsGenerating Situations

•• MixturesMixtures occur when more than one dataoccur when more than one data--generating process is superimposed on top generating process is superimposed on top of one another.of one another.

Mixtures: A ProblemMixtures: A Problem

Page 7: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-25

ChiChi--Square Test for GoodnessSquare Test for Goodness--ofof--FitFit

•• A simple A simple ““eyeballeyeball”” inspection of the inspection of the histogram or dot plot may suffice to rule histogram or dot plot may suffice to rule out a hypothesized population.out a hypothesized population.

Eyeball TestsEyeball Tests

•• GoodnessGoodness--ofof--fit tests may lack power in fit tests may lack power in small samples. As a guideline, a chismall samples. As a guideline, a chi--square goodnesssquare goodness--ofof--fit test should be fit test should be avoided if avoided if nn < 25.< 25.

Small Expected FrequenciesSmall Expected Frequencies

15-26

Uniform GoodnessUniform Goodness--ofof--Fit TestFit Test

•• The The uniform goodnessuniform goodness--ofof--fitfit test is a special test is a special case of the multinomial in which every value case of the multinomial in which every value has the same chance of occurrence.has the same chance of occurrence.

•• The chiThe chi--square test for a uniform distribution square test for a uniform distribution compares all compares all cc groups simultaneously.groups simultaneously.

•• The hypotheses are:The hypotheses are:HH00: : ππ11 = = ππ22 = = ……, , ππcc = 1/= 1/ccHH11: Not all : Not all ππjj are equalare equal

Uniform DistributionUniform Distribution

15-27

Uniform GoodnessUniform Goodness--ofof--Fit TestFit Test

•• The test can be performed on data that are The test can be performed on data that are already tabulated into groups.already tabulated into groups.

•• Calculate the expected frequency Calculate the expected frequency eejj for each for each cell.cell.

•• The degrees of freedom are The degrees of freedom are νν = c = c –– 1 since there 1 since there are no parameters for the uniform distribution.are no parameters for the uniform distribution.

•• Obtain the critical value Obtain the critical value χχ22αα from Appendix E for from Appendix E for

the desired level of significance the desired level of significance αα..•• The The pp--value can be obtained from Excel. value can be obtained from Excel. •• Reject Reject HH00 if if pp--value value << αα..

Uniform GOF Test: Grouped DataUniform GOF Test: Grouped Data

15-28

Uniform GoodnessUniform Goodness--ofof--Fit TestFit Test

•• First form First form cc bins of equal width and create a bins of equal width and create a frequency distribution.frequency distribution.

•• Calculate the observed frequency Calculate the observed frequency ffjj for each bin.for each bin.•• Define Define eejj = = n/c.n/c.•• Perform the chiPerform the chi--square calculations.square calculations.•• The degrees of freedom are The degrees of freedom are νν = c = c –– 1 since there 1 since there

are no parameters for the uniform distribution.are no parameters for the uniform distribution.•• Obtain the critical value from Appendix E for a Obtain the critical value from Appendix E for a

given significance level given significance level αα and make the and make the decision.decision.

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

Page 8: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-29

Uniform GoodnessUniform Goodness--ofof--Fit TestFit Test

•• Maximize the testMaximize the test’’s power by defining bin s power by defining bin width aswidth as

•• As a result, the expected frequencies will As a result, the expected frequencies will be as large as possible.be as large as possible.

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

15-30

Uniform GoodnessUniform Goodness--ofof--Fit TestFit Test

•• Calculate the mean and standard deviation of Calculate the mean and standard deviation of the uniform distribution as:the uniform distribution as:µµ = (a + b)/2= (a + b)/2

•• If the data are not skewed and the sample size If the data are not skewed and the sample size is large (is large (nn > 30), then the mean is > 30), then the mean is approximately normally distributed. approximately normally distributed.

•• So, test the hypothesized uniform mean using So, test the hypothesized uniform mean using

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

σσ = [(b = [(b –– a + 1)2 a + 1)2 –– 1)/121)/12

15-31

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

•• In a Poisson distribution model, In a Poisson distribution model, XXrepresents the number of events per unit of represents the number of events per unit of time or space.time or space.

•• XX is a discrete nonnegative integer (is a discrete nonnegative integer (XX = 0, 1, = 0, 1, 2, 2, ……))

•• Event arrivals must be independent of each Event arrivals must be independent of each other.other.

•• Sometimes called a model of Sometimes called a model of rare eventsrare eventsbecause because XX typically has a small mean.typically has a small mean.

Poisson DataPoisson Data--Generating SituationsGenerating Situations

15-32

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

•• The mean The mean λλ is the only parameter.is the only parameter.•• Assuming that Assuming that λλ is unknown and must be is unknown and must be

estimated from the sample, the steps are:estimated from the sample, the steps are:Step 1: Tally the observed frequency Step 1: Tally the observed frequency ffjj of of each each XX--value.value.Step 2: Estimate the mean Step 2: Estimate the mean λλ from the from the sample.sample.Step 3: Use the estimated Step 3: Use the estimated λλ to find the to find the Poisson probability Poisson probability PP((XX) for each value of ) for each value of XX..

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

Page 9: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-33

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

Step 4: Multiply Step 4: Multiply PP((XX) by the sample size ) by the sample size nnto get expected Poisson frequencies to get expected Poisson frequencies eejj..Step 5: Perform the chiStep 5: Perform the chi--square square calculations.calculations.Step 6: Make the decision.Step 6: Make the decision.

•• You may need to combine classes until You may need to combine classes until expected frequencies become large enough expected frequencies become large enough for the test (at least until for the test (at least until eejj >> 2).2).

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

15-34

•• Calculate the sample mean as:Calculate the sample mean as:

•• Using this estimate mean, calculate the Using this estimate mean, calculate the Poisson probabilities either by using the Poisson probabilities either by using the Poisson formulaPoisson formula

PP((xx) = () = (λλxxee--λλ)/)/xx! or Excel.! or Excel.

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

Poisson GOF Test: Tabulated DataPoisson GOF Test: Tabulated Data

^̂λλ = = Σ Σ xxj j ffjjcc

j j =1=1

nn

15-35

Poisson GoodnessPoisson Goodness--ofof--Fit TestFit Test

•• For For cc classes with classes with mm = 1 parameter = 1 parameter estimated, the degrees of freedom areestimated, the degrees of freedom are

νν = = cc –– mm –– 1 1 •• Obtain the critical value for a given Obtain the critical value for a given αα from from

Appendix E. Appendix E. •• Make the decision.Make the decision.

Poisson GOF Test: Tabulated DataPoisson GOF Test: Tabulated Data

15-36

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

•• Two parameters, Two parameters, µµ and and σσ, fully describe the , fully describe the normal distribution.normal distribution.

•• Unless Unless µµ and and σσ are know are know aa prioripriori, they must , they must be estimated from a sample by using be estimated from a sample by using xx and and ss..

•• Using these statistics, the chiUsing these statistics, the chi--square square goodnessgoodness--ofof--fit test can be used.fit test can be used.

Normal Data Generating SituationsNormal Data Generating Situations

Page 10: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-37

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

•• Transform the sample observations Transform the sample observations xx11, , xx22, , ……, , xxnn into standardized values.into standardized values.

•• Count the sample observations Count the sample observations ffjj within within intervals of the form intervals of the form and compare and compare them with the known frequencies them with the known frequencies eejj based based on the normal distribution.on the normal distribution.

Method 1: Standardizing the DataMethod 1: Standardizing the Data

xx ++ ksks

15-38

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

Method 1: Standardizing the DataMethod 1: Standardizing the Data

Advantage is a Advantage is a standardized standardized scale.scale.

Disadvantage is Disadvantage is that data are no that data are no longer in the longer in the original units.original units.

Figure 15.14

15-39

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

•• To obtain equalTo obtain equal--width bins, divide the width bins, divide the exact exact data range data range into into cc groups of equal width.groups of equal width.Step 1: Count the sample observations in Step 1: Count the sample observations in each bin to get observed frequencies each bin to get observed frequencies ffjj..Step 2: Convert the bin limits into Step 2: Convert the bin limits into standardized zstandardized z--values by using the formula.values by using the formula.

Method 2: Equal Bin WidthsMethod 2: Equal Bin Widths

15-40

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

Step 3: Find the normal area within each Step 3: Find the normal area within each bin assuming a normal distribution.bin assuming a normal distribution.

Step 4: Find expected frequencies Step 4: Find expected frequencies eejj by by multiplying each normal area by themultiplying each normal area by thesample size sample size nn..

•• Classes may need to be collapsed from the Classes may need to be collapsed from the ends inward to enlarge expected ends inward to enlarge expected frequencies.frequencies.

Method 2: Equal Bin WidthsMethod 2: Equal Bin Widths

Page 11: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-41

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

•• Define histogram bins in such a way that an Define histogram bins in such a way that an equal number of observations would be equal number of observations would be expectedexpected within each bin under the null within each bin under the null hypothesis.hypothesis.

•• Define bin limits so that Define bin limits so that eejj = = nn//cc•• A normal area of 1/A normal area of 1/cc in each of the in each of the cc bins is bins is

desired.desired.•• The first and last classes must be openThe first and last classes must be open--ended ended

for a normal distribution, so to define for a normal distribution, so to define cc bins, bins, we need we need cc –– 1 1 cutpointscutpoints..

Method 3: Equal Expected FrequenciesMethod 3: Equal Expected Frequencies

15-42

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

•• The upper limit of bin The upper limit of bin jj can be found can be found directly by using Excel.directly by using Excel.

•• Alternatively, find Alternatively, find zzjj for bin for bin jj using Excel using Excel and then calculate the upper limit for bin and then calculate the upper limit for bin jjas as

•• Once the bins are defined, count the Once the bins are defined, count the observations observations ffjj within each bin and within each bin and compare them with the expected compare them with the expected frequencies frequencies eejj = = nn//cc..

Method 3: Equal Expected FrequenciesMethod 3: Equal Expected Frequencies

xx + + zzjjss

15-43

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

Method 3: Equal Expected FrequenciesMethod 3: Equal Expected Frequencies•• Standard normal cutpoints for equal area bins.Standard normal cutpoints for equal area bins.

Table 15.16

15-44

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

HistogramsHistograms•• The fitted normal histogram gives visual clues The fitted normal histogram gives visual clues

as to the likely outcome of the as to the likely outcome of the GOFGOF test.test.•• Histograms reveal any outliers or other nonHistograms reveal any outliers or other non--

normality issues.normality issues.•• Further tests are needed since histograms Further tests are needed since histograms

vary.vary.

Figure 15.15

Page 12: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-45

Normal ChiNormal Chi--SquareSquareGoodnessGoodness--ofof--Fit TestFit Test

Critical Values for Normal GOF TestCritical Values for Normal GOF Test•• Since two parameters, m and s, are Since two parameters, m and s, are

estimated from the sample, the degrees of estimated from the sample, the degrees of freedom are freedom are νν = = c c –– mm –– 1 1

•• At least 4 bins are needed to ensure 1 df.At least 4 bins are needed to ensure 1 df.

Table 15.19

15-46

ECDF TestsECDF Tests

KolmogorovKolmogorov--Smirnov and Smirnov and LillieforsLilliefors TestsTests•• There are many alternatives to the chiThere are many alternatives to the chi--square square

test based on the test based on the Empirical Cumulative Empirical Cumulative Distribution Function Distribution Function ((ECDFECDF).).

•• The The KolmogorovKolmogorov--Smirnov Smirnov (K(K--S) test statistic S) test statistic DDis the largest absolute difference between the is the largest absolute difference between the actual and expected cumulative relative actual and expected cumulative relative frequency of the frequency of the nn data values:data values:

DD = Max |= Max |FFaa –– FFee||•• The KThe K--S test is not recommended for grouped S test is not recommended for grouped

data.data.

15-47

ECDF TestsECDF Tests

KolmogorovKolmogorov--Smirnov and Lilliefors TestsSmirnov and Lilliefors Tests•• FFaa is the actual cumulative frequency at is the actual cumulative frequency at

observation observation ii..•• FFee is the expected cumulative frequency at is the expected cumulative frequency at

observation observation ii under the assumption that the under the assumption that the data came from the hypothesized distribution.data came from the hypothesized distribution.

•• The KThe K--S test assumes that no parameters are S test assumes that no parameters are estimated.estimated.

•• If parameters are estimated, use a If parameters are estimated, use a Lilliefors Lilliefors testtest..

•• Both of these tests are done by computer.Both of these tests are done by computer.

15-48

ECDF TestsECDF Tests

KolmogorovKolmogorov--Smirnov and Lilliefors TestsSmirnov and Lilliefors Tests

K-S test foruniformity.

Figure 15.20

Page 13: Chi-Square Tests 15 - @@ Home - KKU Web Hosting · 15-13 Chi-Square Test for Independence • The chi-square test is unreliable if the expected frequencies are too small. • Rules

15-49

ECDF TestsECDF Tests

KolmogorovKolmogorov--Smirnov and Lilliefors TestsSmirnov and Lilliefors Tests

K-S test fornormality.

Figure 15.21

15-50

ECDF TestsECDF Tests

AndersonAnderson--Darling TestsDarling Tests•• The The AndersonAnderson--Darling Darling (A(A--D)D) testtest is widely used is widely used

for nonfor non--normality because of its power.normality because of its power.•• The AThe A--D test is based on a D test is based on a probability plotprobability plot..•• When the data fit the hypothesized distribution When the data fit the hypothesized distribution

closely, the probability plot will be close to a closely, the probability plot will be close to a straight line.straight line.

•• The AThe A--D test statistic measures the overall D test statistic measures the overall distance between the actual and the distance between the actual and the hypothesized distributions, using a weighted hypothesized distributions, using a weighted squared distance.squared distance.

15-51

ECDF TestsECDF Tests

AndersonAnderson--Darling Tests with MINITABDarling Tests with MINITAB

Figure 15.22

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc.

Applied Statistics in Applied Statistics in Business & EconomicsBusiness & Economics

End of Chapter 15End of Chapter 15

15-52