The Chi-Square Distribution 1. The student will be able to Perform a Goodness of Fit hypothesis...
-
Upload
isaac-lane -
Category
Documents
-
view
213 -
download
0
Transcript of The Chi-Square Distribution 1. The student will be able to Perform a Goodness of Fit hypothesis...
CHAPTER 11 OBJECTIVES
The student will be able to
Perform a Goodness of Fit hypothesis test
Perform a Test of Independence hypothesis test
2
CHI-SQUARE DISTRIBUTION
Chi-square is a distribution test statistics used to determine 3 thingsDoes our data fit a certain
distribution? Goodness-of-fitAre two factors independent?
Test of independenceDoes our variance change?
Test of single variance
3
CHI-SQUARE DISTRIBUTION
Notationnew random variable ~µ = df 2 = 2df
Facts about Chi-squareNonsymmetrical and skewed
rightvalue is always > zerocurve looks different for
different degrees of freedom. As df gets larger curve approaches normal df > 90
mean is located to the right of the peak
4
2 df2
GOODNESS-OF-FIT Hypothesis test steps are the
same as always with the following changesTest is always a right-tailed testNull and alternate hypothesis
are in words rather than equations
degrees of freedom = number of intervals - 1
test statistic defined as
5
n EEO 2
2 )(
GOODNESS-OF-FITAN EXAMPLE
A 6-sided die is rolled 120 times. The results are in the table below. Conduct a hypothesis test to determine if the die is fair.
6
Face Value Frequency
1 15
2 29
3 16
4 15
5 30
6 15
GOODNESS-OF-FITAN EXAMPLE
Contradictory hypothesesHo: observed data fits a
Uniform distribution (die is fair)Ha: observed data does not fit
a Uniform distribution (die is not fair)
Determine distributionChi-square goodness-of-fit right-tailed test
Perform calculations to find pvalueenter observed into L1enter expected into L2
7
GOODNESS-OF-FITAN EXAMPLE
Perform calculations (cont.)TI83
Access LIST, MATH, SUM enter sum((L1 - L2)2/L2) this is the test statistic
For our problem chi-square = 13.6
Access DISTR and chicdf syntax is (test stat, 199, df) generate pvalue
For our problem pvalue = 0.0184
Make decisionsince α > 0.0184, reject null
Concluding statementThere is sufficient evidence to
conclude that the observed data does not fit a uniform distribution. (The die is not fair.)
8
TEST OF INDEPENDENCE
Hypothesis testing steps the same with the following editNull and alternate in wordshave a contingency tableexpected values are calculated
from the table (row total)(column total)
sample sizeTest statistic same
df = (#columns - 1)(#row - 1)always right-tailed test
9
n EEO 2
2 )(
TEST OF INDEPENDENCE
AN EXAMPLE Conduct a hypothesis test to
determine whether there is a relationship between an employees performance in a company’s training program and his/her ultimate success on the job. Use a level of significance of 1%.
Ho: Performance in training and success on job are independent
Ha: Performance in training and success on job are not independent (or dependent).
10
TEST OF INDEPENDENCE
AN EXAMPLE Performance on job versus
performance in training Performance on Job
11
Below Average
Average
Above Average
TOTAL
Poor 23 60 29 112
Average
28 79 60 167
Very Good
9 49 63 121
TOTAL 60 188 152 400Per
form
ance
in tr
aini
ng
TEST OF INDEPENDENCE
AN EXAMPLE Determine distribution
right tailedchi-square
Perform calculations to find pvalueCalculator will calculated
expected values. We must enter contingency table as a Matrix (ack!) Access MATRIX and edit Matrix A Access Chi-square test
Matrix A = observedMatrix B calculator places
expected here
12
TEST OF INDEPENDENCE
AN EXAMPLE Perform calculations (cont.)
pvalue = 0.0005 Make decision.
= 0.01 > pvalue = 0.0005 reject null hypothesis
Concluding statement.Performance in training and job
success are dependent.
13
CHAPTER 12
Linear Regression and CorrelationChapter Objectives
14
CHAPTER 12 OBJECTIVES
The student should be able to:
Discuss basic ideas of linear regression and correlation.
Create and interpret a line of best fit.
Calculate and interpret the correlation coefficient.
Find outliers.
15
LINEAR REGRESSION
Method for finding the “best fit” line through a scatterplot of paired data independent variable (x) versus
dependent variable (y) Recall from Algebra
equation of line y = a + bxwhere a is the y-interceptb is the slope of the line
if b>0, slope upward to right if b<0, slope downward to right if b=0, line is horizontal
16
LINEAR REGRESSION
The eye-ball methodDraw what looks to you to be the
best straight line fitPick two points on the line and
find the equation of the line
The calculated method from calculus, we find the line
that minimizes the distance each point is from the line that best fits the scatterplot
letting the calculator do the work using LinRegTTest
17
An example
THE CORRELATION COEFFICIENT
Used to determine if the regression line is a “good fit”
ρ is the population correlation coefficient
r is the sample correlation coefficient
Formidable equationsee textCalculator does the work
r positive - upward to right r negative - downward to right r zero - no correlation
18
Graphs
THE CORRELATION COEFFICIENT
Determining if there is a “good fit” Gut method
if calculated r is close to 1 or -1, there’s a good fit
Hypothesis test (LinRegTest) Ho: ρ = 0 Ha ρ ≠ 0
Ho means here IS NOT a significant linear relationship(correlation) between x and y in the population.
Ha means here IS A significant linear relationship (correlation) between x and y in the population
To reject Ho means that there is a linear relationship between x and y in the population. Does not mean that one CAUSES the other.
Comparison to critical value Use table end of chapter
Determine degrees of freedom df = n - 2 If r < negative critical value, then r is significant
and we have a good fit If r > positive critical value, then r is significant and
we have a good fit
19
THE REGRESSION LINE AS A
PREDICTOR If the line is determined to be
a good fit, the equation can be used to predict y or x values from x or y valuesPlug the numbers into the
equationEquation is only valid for the
paired data DOMAIN
20
THE ISSUE OF OUTLIERS
Compare 1.9s to |y - yhat|for each (x, y) pair
if |y - yhat| > 1.9s, the point could be an outlier LinRegTest gives us s y – yhat is put into the RESID
list when the LinRegTest is done To see the RESID list: go to
STAT, Edit, move cursor to a blank list name and type RESID, the residuals will show up.
21
CHAPTER 13 OBJECTIVES
The student should be able to: Interpret the F distribution as
the number of groups and the sample size change.
Discuss two uses for the F distribution and ANOVA.
Conduct and interpret ANOVA
23
SINGLE FACTOR ANALYSIS OF VARIANCE
ANOVA
What is it good for? Determines the existence of
statistically significant differences among several group means.
Basic assumptions Each population from which a sample
is taken is assumed to be normal. Each sample is randomly selected and
independent. The populations are assumed to have
equal standard deviations (or variances).
The factor is the categorical variable. The response is the numerical
variable. The Hypotheses
Ho: µ1=µ2=µ2=…=µk
Ha: At least two of the group means are not equal
Always a right-tailed test24
F DISTRIBUTION
Named after Sir Ronald Fisher F statistic is a ratio (i.e.
fraction) two sets of degrees of freedom
(numerator and denominator)F ~ Fdf(num),df(denom)
Two estimates of variance are madeVariation between samples
Estimate of σ2 that is the variance of the sample means
Variation due to treatment (i.e. explained variation)
Variation within samples Estimate of σ2 that is the average
of the sample variances Variations due to error (i.e.
unexplained variation)
25
F DISTRIBUTION FACTS
Curve is skewed right. Different curve for each set of
degrees of freedom. As the dfs for numerator and
denominator get larger, the curve approximates the normal distribution
F statistic is greater than or equal to zero
Other uses Comparing two variances Two-Way Analysis of Variance
26
THE F STATISTIC
Formula
MSbetween – mean square explained by the different groups
MSwithin – mean square that is due to chance
SSbetween – sum of squares that represents the variations among different samples
SSwithin – sum of squares that represents the variation within samples that is due to chance
27
within
between
MS
MSF
between
betweenbetween df
SSMS
within
withinwithin df
SSMS
THANK GOODNESS FOR OUR CALCULATOR!!! Enter the table data by
columns into L1, L2, L3…. Do ANOVA test – ANOVA(L1,
L2,..) What the calculator gives
F – the F statisticsp – the pvalueFactor – the between stuff
df = # groups – 1 = k – 1 SSbetween
MSbetween
Error – the within stuff df = total number of samples – #
of groups = N – k SSwithin
MSwithin
28
AN EXAMPLEFour sororities took a random sample of sisters regarding their grade averages for the past term. The results are shown below:
Using a significance level of 1%, is there a difference in grade averages among the sororities?
29
Sorority1
Sorority 2
Sorority 3
Sorority 4
2.17 2.63 2.63 3.79
1.85 1.77 3.78 3.45
2.83 3.25 4.00 3.08
1.69 1.86 2.55 2.26
3.33 2.21 2.45 3.18
REVIEW FOR FINAL EXAM
What’s fair gameChapter 1, Chapter 2.,
Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7, Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12
42 multiple choice questionsDo problems from each chapter
What to bring with youScantron (#2052), pencil,
eraser, calculator, 2 sheets of notes (8.5x11 inches, both sides)
30