The Chi-Square Distribution. Preliminary Idea Sum of n values of a random variable.

Post on 19-Jan-2016

254 views 0 download

Tags:

Transcript of The Chi-Square Distribution. Preliminary Idea Sum of n values of a random variable.

The Chi-Square Distribution

Preliminary IdeaSum of n values of a random variable

1 2 3

1 2 3

We know that if a random variable X is normal,

...then has a Student's t-distribution

with ( -1) degrees of freedom.

Since ... = , it follows that the sum of values of a

n

n

x x x xx

nn

x x x x nx n

normal

random variable X also follows a Student's t-distribution with ( -1) degrees

of freedom.

n

Sum of Squares of random numbers

1 2 3

21

Question :

If ... follows a Student's t - distribution with ( -1)

degrees of freedom, what can we say about the sum of the squares

of these values?

In other words, what is the distribution of

nx x x x n

x

2 2 22 3

2 2 2 21 2 3

2

... ?

Answer :

Statisticans have shown that if is normal, then the sum of squares

of values of , namely,

...

has a χ distribution with ( -1) degrees of freedom.

n

n

x x x

X

n X

x x x x

n

The distribution2

1. It is called the chi-square distribution.

2. “Chi” rhymes with “High” – and the “ch” is pronounced like “k”.

3. It is a continuous random variable.

4. It has n – 1 degrees of freedom

5. It’s values are non-negative (i.e. ≥ 0)

6. It is always skewed to the right.

7. It becomes more symmetrical as n increases

8. It approximates a normal distribution for large values of n

Two Chi-square distributions

The sample variance s2 follows a chi-square distribution

2

2

22

2 2 2 2

1 2 3

The sample variance is defined by

.1

It follows that ( -1)

... .

Since the right-hand-side of this expression is a sum of sq

i

i

n

x xs

n

n s x x

x x x x x x x x

2 2

uares it follows that,

when X is normal, ( -1) has a distribution with ( -1) degrees of freedom.n s n

Standardizing the Test Statistic

In a test of hypothesis for a population variance σ2, the test statistic is the sample variance s2. The standardized test statistic is denoted by and is defined by:

2*

22*

20

1n s

Note: The standardized values are found in the standard chi-square tables on page 7 in the Formulas and Tables handout.

Chi-square table characteristics

The chi-square tables are not symmetrical.

Therefore lower-tail values and upper-tail values must be listed separately.

In the extract of the chi-square tables shown in the next slide, lower-tail areas are shaded in yellow, upper tail areas are shaded in blue.

Chi-square table (Page 7 in Formulas & Tables)

df 0.005 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99 0.995

1 0.0000393 0.000157 0.000982 0.00393 0.0158 2.71 3.84 5.02 6.63 7.88

2 0.0001 0.020 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.60

3 0.003 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84

4 0.018 0.297 0.484 0.711 1.064 7.78 9.49 11.14 13.28 14.86

5 0.056 0.554 0.831 1.145 1.61 9.24 11.07 12.83 15.09 16.75

6 0.126 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55

7 0.228 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28

8 0.36 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95

9 0.53 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59

10 0.73 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19

11 0.95 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.76

12 1.20 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

25 6.08 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93

26 6.55 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29

27 7.03 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65

28 7.50 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99

29 8.00 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34

30 8.50 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67

Chi-square table

Examples

2

2

Lower Tail

(.05;10) 3.94

Upper Tail

(.95;10) 18.31

Two-Tail Test of Hypothesis

2 20 0

2 21 0

22 2*

20

21

22

2*0 1 2

2* 2*0 1 2

H :

H :

1TS:

AL: ( / 2, 1)

(1 / 2, 1)

DR: Do not reject H if

Reject H if or

n ss

A n

A n

A A

A A

Lower Tail Test of Hypothesis

2 20 0

2 21 0

22 2*

20

2

2*0

2*0

H :

H :

1TS:

AL: ( , 1)

DR: Do not reject H if

Reject H if

n ss

A n

A

A

Upper Tail Test of Hypothesis

2 20 0

2 21 0

22 2*

20

2

2*0

2*0

H :

H :

1TS:

AL: (1 , 1)

DR: Do not reject H if

Reject H if

n ss

A n

A

A

Example

A random sample of 20 students' grades had a standard deviation of 14.2%. Test the professor's claim with =

A professor claims that the standard deviation of grades in an exam is 10% i.e. =.10.

2

0.05.

Note: We cannot test for a population standard deviation directly, we must first convert it to the equivalent test for a population variance.

Thus, a test for =.10 is changed to a test for (. 2

2 2

10) .01. In this example the test statistic is a standard deviation of 14.2%

or .142 =(.142) = .020164.s s

The test of hypothesis

20

21

22 2*

20

21

22

2*0

2* 2*0

H : .01

H : .01

1 (20 1)(.020164)TS: .020164 38.3116

.01

AL: A (.025;19) 8.91

A (.975;19) 32.85

DR: Do not reject H if 8.91 32.85

Reject H if 8.91 or 32

n ss

0

.85

Conclusion: Reject H The professor is wrong, the standard deviation

is not equal to .10. Since the test statistic is greater

than the upper action limit, we can conclude that the

standard deviation of grades is greater than 10%.

The F distribution

Comparison of Two Population variances

2 20 1 2

2 21 1 2

H :

H :

We want to test the hypothesis that two population variances are equal, i.e.

We need to rewrite the null and alternative hypotheses so that we can use a single value to represent the test statistic.

Ratio of Variances

The null and alternative hypotheses are converted to the following form.

21

0 22

21

1 22

H : 1

H : 1

The Test StatisticA natural candidate to be the test statistic for the ratio of two population variances is the ratio of the corresponding sample variances

2122

s

s

The F-distribution

Statisticians have shown that the ratio of two chi-square variables follows a new distribution known as the F-distribution.

2

1

2 1

2

If we have one variable with -1 degrees of freedom, and another with

1 degrees of freedom then the ratio has an -distribution with 1 degrees

of freedom for the numerator and 1 degrees of

n

n F n

n

2

11 2 2

2

freedom for the denominator.

Therefore, for a specified cumulative area

( ; 1) ( ; 1; 1)

; 1)

a

a nF a n n

a n

Extract of F-tables (1-α=.95)

  The F-distribution with 1 - α = .95

Denominator numerator df

df 1 2 3 4 5 6 7 8 9 10

1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9

2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40

3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14

10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98

11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85

12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75

13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67

14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60

15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54

F-distribution examples

F(.95;4,9) = 3.63

F(.95;8,3) = 8.85

F(.99;15,20) = 3.09

F(.99;40,30) = 2.30

Ratio of Variances

We have already seen that for a sample of size n the sample variance has a χ2 distribution with n - 1 degrees of freedom.

It follows that the ratio of two variances

2122

s

s

numerator 1 denominator 2has an F-distribution with df 1 and df 1.n n

Test of Hypothesis for two variances2 2

0 1 2

2 21 1 2

21

0 22

21

1 22

2122

1 1 2

2 1 2

0 1 2 0 1 2

H :

H :

Rewrite the hypotheses as:

H : 1

H : 1

TS: *

AL: ( / 2; 1, 1)

(1 / 2; 1, 1)

DR: Do not reject H if * , Reject H if F* <A or F > A

sF

s

A F n n

A F n n

A F A

One-Tail Tests

For Lower Tail Tests: A = F(; n1 - 1; n2 - 1)

For Upper Tail Tests:A = F(1 - ; n1 - 1; n2 - 1).

Formula for Lower Tail F-values

Since the lower tail F-values are not given in the table we must use the formulas:

1 22 1

1 22 1

For one-tail tests:

1( ; -1, -1)

(1- ; 1, 1)

For two-tail tests:

1( / 2; -1, -1)

(1- / 2; 1, 1)

F n nF n n

F n nF n n

Examples of Lower tail F-values

F(.05;5,9) = 1/F(.95;9,5)

= 1/4.77

= 0.2096

F(.05;7,4) = 1/F(.95;4,7)

= 1/4.12

= 0.2427

EXAMPLE

The production manager of a textile company wants to test the hypothesis that the mean cost of producing a polyester fabric is the same for two different production processes. Assume that production costs are normally distributed for both processes.

Random samples of production costs for several production runs using the two different production processes are as follows:

Test the hypothesis that the two population variances are equal with a 2% level of significance.

Process I$20 $15 $20 $23 $24 $21

Process II$27 $19 $41 $30 $16

Sample Data

Pop 1 Pop 2

Sample size

n1 = 6 n2 = 5

Mean 20.5 26.6

Variance 9.9 97.3

Testing the Hypothesis21

0 22

21

1 22

2122

1 1 2

2 1 2

0

H : 1

H : 1

9.9TS: * .1017

97.3

AL: ( / 2; 1, 1) (.01;5,4) 1/ (.99;4,5) 1/11.39 .088

(1 / 2; 1, 1) (.99;5,4) 15.52

DR: Do not reject H if .088 * 15.52 , Reject

sF

s

A F n n F F

A F n n F

F

0

0

H if F* <.088 or F > 15.52

Conclusion: Do not reject H .