The Chi-Square Distribution 1. The student will be able to Perform a Goodness of Fit hypothesis...

31
CHAPTER 11 The Chi-Square Distribution 1 2

Transcript of The Chi-Square Distribution 1. The student will be able to Perform a Goodness of Fit hypothesis...

CHAPTER 11

The Chi-Square

Distribution

1

2

CHAPTER 11 OBJECTIVES

The student will be able to

Perform a Goodness of Fit hypothesis test

Perform a Test of Independence hypothesis test

2

CHI-SQUARE DISTRIBUTION

Chi-square is a distribution test statistics used to determine 3 thingsDoes our data fit a certain

distribution? Goodness-of-fitAre two factors independent?

Test of independenceDoes our variance change?

Test of single variance

3

CHI-SQUARE DISTRIBUTION

Notationnew random variable ~µ = df 2 = 2df

Facts about Chi-squareNonsymmetrical and skewed

rightvalue is always > zerocurve looks different for

different degrees of freedom. As df gets larger curve approaches normal df > 90

mean is located to the right of the peak

4

2 df2

GOODNESS-OF-FIT Hypothesis test steps are the

same as always with the following changesTest is always a right-tailed testNull and alternate hypothesis

are in words rather than equations

degrees of freedom = number of intervals - 1

test statistic defined as

5

n EEO 2

2 )(

GOODNESS-OF-FITAN EXAMPLE

A 6-sided die is rolled 120 times. The results are in the table below. Conduct a hypothesis test to determine if the die is fair.

6

Face Value Frequency

1 15

2 29

3 16

4 15

5 30

6 15

GOODNESS-OF-FITAN EXAMPLE

Contradictory hypothesesHo: observed data fits a

Uniform distribution (die is fair)Ha: observed data does not fit

a Uniform distribution (die is not fair)

Determine distributionChi-square goodness-of-fit right-tailed test

Perform calculations to find pvalueenter observed into L1enter expected into L2

7

GOODNESS-OF-FITAN EXAMPLE

Perform calculations (cont.)TI83

Access LIST, MATH, SUM enter sum((L1 - L2)2/L2) this is the test statistic

For our problem chi-square = 13.6

Access DISTR and chicdf syntax is (test stat, 199, df) generate pvalue

For our problem pvalue = 0.0184

Make decisionsince α > 0.0184, reject null

Concluding statementThere is sufficient evidence to

conclude that the observed data does not fit a uniform distribution. (The die is not fair.)

8

TEST OF INDEPENDENCE

Hypothesis testing steps the same with the following editNull and alternate in wordshave a contingency tableexpected values are calculated

from the table (row total)(column total)

sample sizeTest statistic same

df = (#columns - 1)(#row - 1)always right-tailed test

9

n EEO 2

2 )(

TEST OF INDEPENDENCE

AN EXAMPLE Conduct a hypothesis test to

determine whether there is a relationship between an employees performance in a company’s training program and his/her ultimate success on the job. Use a level of significance of 1%.

Ho: Performance in training and success on job are independent

Ha: Performance in training and success on job are not independent (or dependent).

10

TEST OF INDEPENDENCE

AN EXAMPLE Performance on job versus

performance in training Performance on Job

11

Below Average

Average

Above Average

TOTAL

Poor 23 60 29 112

Average

28 79 60 167

Very Good

9 49 63 121

TOTAL 60 188 152 400Per

form

ance

in tr

aini

ng

TEST OF INDEPENDENCE

AN EXAMPLE Determine distribution

right tailedchi-square

Perform calculations to find pvalueCalculator will calculated

expected values. We must enter contingency table as a Matrix (ack!) Access MATRIX and edit Matrix A Access Chi-square test

Matrix A = observedMatrix B calculator places

expected here

12

TEST OF INDEPENDENCE

AN EXAMPLE Perform calculations (cont.)

pvalue = 0.0005 Make decision.

= 0.01 > pvalue = 0.0005 reject null hypothesis

Concluding statement.Performance in training and job

success are dependent.

13

CHAPTER 12

Linear Regression and CorrelationChapter Objectives

14

CHAPTER 12 OBJECTIVES

The student should be able to:

Discuss basic ideas of linear regression and correlation.

Create and interpret a line of best fit.

Calculate and interpret the correlation coefficient.

Find outliers.

15

LINEAR REGRESSION

Method for finding the “best fit” line through a scatterplot of paired data independent variable (x) versus

dependent variable (y) Recall from Algebra

equation of line y = a + bxwhere a is the y-interceptb is the slope of the line

if b>0, slope upward to right if b<0, slope downward to right if b=0, line is horizontal

16

LINEAR REGRESSION

The eye-ball methodDraw what looks to you to be the

best straight line fitPick two points on the line and

find the equation of the line

The calculated method from calculus, we find the line

that minimizes the distance each point is from the line that best fits the scatterplot

letting the calculator do the work using LinRegTTest

17

An example

THE CORRELATION COEFFICIENT

Used to determine if the regression line is a “good fit”

ρ is the population correlation coefficient

r is the sample correlation coefficient

Formidable equationsee textCalculator does the work

r positive - upward to right r negative - downward to right r zero - no correlation

18

Graphs

THE CORRELATION COEFFICIENT

Determining if there is a “good fit” Gut method

if calculated r is close to 1 or -1, there’s a good fit

Hypothesis test (LinRegTest) Ho: ρ = 0 Ha ρ ≠ 0

Ho means here IS NOT a significant linear relationship(correlation) between x and y in the population.

Ha means here IS A significant linear relationship (correlation) between x and y in the population

To reject Ho means that there is a linear relationship between x and y in the population. Does not mean that one CAUSES the other.

Comparison to critical value Use table end of chapter

Determine degrees of freedom df = n - 2 If r < negative critical value, then r is significant

and we have a good fit If r > positive critical value, then r is significant and

we have a good fit

19

THE REGRESSION LINE AS A

PREDICTOR If the line is determined to be

a good fit, the equation can be used to predict y or x values from x or y valuesPlug the numbers into the

equationEquation is only valid for the

paired data DOMAIN

20

THE ISSUE OF OUTLIERS

Compare 1.9s to |y - yhat|for each (x, y) pair

if |y - yhat| > 1.9s, the point could be an outlier LinRegTest gives us s y – yhat is put into the RESID

list when the LinRegTest is done To see the RESID list: go to

STAT, Edit, move cursor to a blank list name and type RESID, the residuals will show up.

21

CHAPTER 13

F Distribution and ANOVA

22

CHAPTER 13 OBJECTIVES

The student should be able to: Interpret the F distribution as

the number of groups and the sample size change.

Discuss two uses for the F distribution and ANOVA.

Conduct and interpret ANOVA

23

SINGLE FACTOR ANALYSIS OF VARIANCE

ANOVA

What is it good for? Determines the existence of

statistically significant differences among several group means.

Basic assumptions Each population from which a sample

is taken is assumed to be normal. Each sample is randomly selected and

independent. The populations are assumed to have

equal standard deviations (or variances).

The factor is the categorical variable. The response is the numerical

variable. The Hypotheses

Ho: µ1=µ2=µ2=…=µk

Ha: At least two of the group means are not equal

Always a right-tailed test24

F DISTRIBUTION

Named after Sir Ronald Fisher F statistic is a ratio (i.e.

fraction) two sets of degrees of freedom

(numerator and denominator)F ~ Fdf(num),df(denom)

Two estimates of variance are madeVariation between samples

Estimate of σ2 that is the variance of the sample means

Variation due to treatment (i.e. explained variation)

Variation within samples Estimate of σ2 that is the average

of the sample variances Variations due to error (i.e.

unexplained variation)

25

F DISTRIBUTION FACTS

Curve is skewed right. Different curve for each set of

degrees of freedom. As the dfs for numerator and

denominator get larger, the curve approximates the normal distribution

F statistic is greater than or equal to zero

Other uses Comparing two variances Two-Way Analysis of Variance

26

THE F STATISTIC

Formula

MSbetween – mean square explained by the different groups

MSwithin – mean square that is due to chance

SSbetween – sum of squares that represents the variations among different samples

SSwithin – sum of squares that represents the variation within samples that is due to chance

27

within

between

MS

MSF

between

betweenbetween df

SSMS

within

withinwithin df

SSMS

THANK GOODNESS FOR OUR CALCULATOR!!! Enter the table data by

columns into L1, L2, L3…. Do ANOVA test – ANOVA(L1,

L2,..) What the calculator gives

F – the F statisticsp – the pvalueFactor – the between stuff

df = # groups – 1 = k – 1 SSbetween

MSbetween

Error – the within stuff df = total number of samples – #

of groups = N – k SSwithin

MSwithin

28

AN EXAMPLEFour sororities took a random sample of sisters regarding their grade averages for the past term. The results are shown below:

Using a significance level of 1%, is there a difference in grade averages among the sororities?

29

Sorority1

Sorority 2

Sorority 3

Sorority 4

2.17 2.63 2.63 3.79

1.85 1.77 3.78 3.45

2.83 3.25 4.00 3.08

1.69 1.86 2.55 2.26

3.33 2.21 2.45 3.18

REVIEW FOR FINAL EXAM

What’s fair gameChapter 1, Chapter 2.,

Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7, Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12

42 multiple choice questionsDo problems from each chapter

What to bring with youScantron (#2052), pencil,

eraser, calculator, 2 sheets of notes (8.5x11 inches, both sides)

30

AND SO ENDS YOUR

MATH 10 EXPERIENCE

Prepare for the Final exam It has been a pleasure having

you in class. Good luck and Godspeed with whatever path you take in life.

31