Lesson 2 Chi squared
description
Transcript of Lesson 2 Chi squared
Lesson Layout
Theory› Introduction› Worked Example› Application
Past paper questions Further application & review
Statistics 2 – (Chi square hypothesis testing)IB Mathematical Studies SL
Syllabus reference
Content Detail
Hypothesis Testing
Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample, performing a specific mathematical test, and then deciding whether or not the conjecture is true.
A conjecture must be stated in two parts:› The null hypothesis (H0) – states that there is no significant
difference between the two parameters being tested (they are “not related to” each other, i.e. independent)
› The alternative hypothesis (H1) states that this is a significant difference.(they are “related” in some way, i.e dependent)
The only hypothesis test covered by the Studies SL course is the Chi Squared test.
Chi-square (X 2) test by GDC
The Chi-square test itself is quite straight forward, your GDC can do it in two steps but you also must know the formula and be able to do it by hand
The hypothesis test which uses Chi-square determines whether or not two variables are related. It follows a general pattern:(1) Make a conjecture(2) Write the null hypothesis using “is not related to, or “independent”;
and write the alternative hypothesis using is related to or “dependent”
(3) Calculate the chi-square test(4) Determine reference values(5) Compare the two and either accept or reject the null hypothesis
Using the GDC
You can find chi-squared on your GDC by using the statistics mode Press F3 {Test} Press F3 again for {Chi}
Note : You must have entered the data in to Matrix A from Matrix mode first!!
Example 1 - Question A researcher conjectures
that seat belt usage, for drivers, is related to gender.She gathers data by recording seat belt usage at several randomly selected intersections. The data has been recorded in the table as shown.
Construct a chi-square hypothesis test to determine if there is enough evidence to support the researcher’s conjecture.
Seat belt usage
Gender Yes No
Female 50 25
Male 40 45
This type of table is called a
contingency table.
(It’s what you put in matrix A on the
GDC)
Example 1 - Solution Since the conjecture has already
been made we can start at step (2) – write the null and alternative hypotheses› H0 – Seat belt usage is not related to
gender › H1 – Seat belt usage is related to
gender Step (3) – calculate chi-square
› Enter the data into Matrix A, from the Run screen press F1 {MAT}
› Enter the dimensions for matrix A which are 2 x 2, then press EXE
› Enter the contigency table data in to the matrix, press EXIT twice to go back to the RUN screen
› In STAT mode, press F3 then F3 again› Highlight EXECUTE and press EXE
Seat belt usage
Gender Yes No
Female 50 25
Male 40 45
Exam hint – you must also be able to do a
contingency table by hand. The largest to be
tested will be 4 x 4.
X 2 TestX 2 =
6.22471211p =
0.01259793df = 1
Example 1 - Solution Step (4) – determine reference
values› There are two reference values of
importance, the p-value (which was calculated during the chi-square test) and the Critical Value which you read off the CV distribution table on your IB formula sheet.
› In this case assume α=0.01 (1%) Step (5) – make a comparison
between either› p-value and the significance levelOR› Chi-square test and the Critical Value
Hence p-value > alpha level, since 0.0126 > 0.01
In other words , we accept the null hypothesis that there is no relationship between seat belt usage and gender.
Exam hint – the only significance levels that will be tested are 1%,
5% and 10%.
If p-value > α level, then we can accept
H0 I.e there is not
enough evidence to reject it
Example 2 - Question From what Lauren
observed, she believes that the number of hours exercised per week is dependent on gender. She collected data randomly and organised the results in the table shown.
Determine whether there is enough evidence to accept or reject the null hypothesis:› a) for α=0.01› b) for α=0.05› c) for α=0.10
Hours exercised per week
Male 5 10 12
Female 9 8 4
Example 2 - Solution Write the null and alternative
hypotheses› H0 – The number of hours exercised
each week independent on gender › H1 – The number of hours exercised
each week is dependent on gender Calculate chi-square and the p-
valueX 2 Test
X 2 = 4.69 (3sf)p = 0.0959
(3sf)df = 2
Hours exercised per week
Male 5 10 12
Female 9 8 4
• Compare p-value to each signficance level
a) 0.09>0.01, hence accept null hypothesis
b) 0.09>0.05, hence accept null hypothesis
c) 0.09<0.10, hence we reject the null hypothesis
Whilst it is not technically correct to say “accept H0” it is
still accepted in the IB.
Questions
Questions
Questions
The chi-square test formula
This formula is on the IB formula sheet
› fo is the observed frequencies(i.e the raw data)
› fe is the expected frequencies
It is easiest to perform this sum calculation using a table one step at a time.
Exam hint – you are expected to be able to calculate the X 2 test
statistic with your GDC when raw data is given.
You should also be able to perfrom an entire X 2
hypothesis test without your GDC.
* Don’t forget you can check your expected values
using Matrix B.
Chi-square (X 2) test by hand
Remember these steps below can be checked using your GDC, especially the expected values, using Matrix B.
Completing a hypothesis test which uses Chi-square by hand follows a similar process to the previous one except some of the steps are much longer:(1) Make a conjecture (same as before)(2) Write the null hypothesis using “is not related to, or “independent”;
and write the alternative hypothesis using is related to or “dependent” (same as before)(3) Calculate the chi-square test statistic (X 2)(4) Determine reference value called the Critical Value (CV)(5) Compare the two and either accept or reject the null hypothesis
Steps 3 & 4 are much
longer!
Step (3) - Calculating the chi-square (X 2) test statistic
This step has as series of sub-parts:(A) Expand the contingency table to have both row totals, column totals and an overall total. The raw data in the table are called “observed values”.(B) Calculate the “expected values” for each cell in the table
based on the probabilities using the totals of each row and column.(C) Organise the frequencies in to a new table to calculate X 2.
Using the Example 1 from before, below is part A shown.Seat belt usage
Gender Yes No Row total
Female 50 25 75
Male 40 45 85
Col total
90 70 160
Part B (cont) Using the Example 1 from before, below part B is shown.
To calculate the expected frequencies (fe) in each cell we use the formula [Col total] x [ Row total] / [Total sum]
Seat belt usage
Gender Yes No Row total
Female 50 25 75
Male 40 45 85
Col total
90 70 160
These values are the
observed frequencies (fo)
Expected frequencies (fe)
Female 90*75/160 = 42.1875
70*75/160 = 32.81235
Male 90*85/160 = 47.8125
70*85/160 = 37.1875
Remember to check
these with Matrix B
Part C (cont) Using the Example 1 from before, below part C is shown.
fo fe fo-fe (fo-fe)2 (fo-fe)2
fe
50 42.1875 7.8125 61.035 1.4468
25 32.81235 -7.8125 61.035 1.8601
40 47.8125 -7.8125 61.035 1.2765
45 37.1875 7.8125 61.035 1.6413
Sum = 6.22 (3sf)
Don’t round to 3sf during these
calculations!
Step (4) – Determine the Critical Value (CV)
If you had a Million Dollars and you gave $1 away, how much would you say you had? (When does it become significant?)
A critical value is a number which represents the boundary in determining whether a statistic is significant or not. That is it separates the choice to accept or reject the null hypothesis.
If the chi-square test value falls below (less than) of the CV then we accept the null hypothesis (H0)
If the chi squared test value falls to the right (greater than) of the CV then we reject the null hypothesis (H0)
The critical value is found using the distribution table on the IB formula sheet.› Left side column represents degrees of freedom (df) = (#cols-
1)*(#rows-1)› Top row has alpha values and p-values, since the chi-squared test is a
right tail test we will always use the five right columns, and since the iB only uses significance levels of 0.01, 0.05 and 0.1 we will only ever need the corresponding p-values 0.99, 0.95, 0.9
For our example, p=0.99, df=1 hence CV = 6.635
Step (5) – Compare X 2 to CV and make conclusion
X 2 = 6.22 CV = 6.635 Hence X 2 < CV and we will accept the null
hypothesis that there is no relationship between seat belt usage and gender.
Remember : can be done in two ways Step (5) – make a comparison between either
› p-value and the significance levelOR› Chi-square test and the Critical Value
Previously we found p-value > alpha level, since 0.0126 > 0.01 and we accepted the null hypothesis.
Can you spot the
difference?
Understanding the final comparison method
If you are comparing p-value with α-level then if: › p > α accept the null hypothesis› p < α reject the null hypothesis
If you are comparing X 2 with CV then if: › X 2 < CV accept the null hypothesis› X 2 > CV reject the null hypothesis
Questions
H&H 2nd Ed – Exercise 20E.1 a-d (p615)
Worked example 8 (p618)
Exercise 20E.2Exercise 20E.3 (p621)