July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111...

18
July 1, 2008 Lecture 17 - Regression Testing 1 Testing Relationships between Variables Statistics 111 - Lecture 17

Transcript of July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111...

July 1, 2008 Lecture 17 - Regression Testing 1

Testing Relationships between Variables

Statistics 111 - Lecture 17

July 1, 2008 Lecture 17 - Regression Testing 2

Administrative NotesAdministrative Notes

• Homework 5 due tomorrow

• Lecture on Wednesday will be review of entire course

July 1, 2008 Lecture 17 - Regression Testing 3

Final Exam

• Thursday from 10:40-12:10 • It’ll be right here in this room

• Calculators are definitely needed!• Single 8.5 x 11 cheat sheet (two-sided)

allowed• I’ve put a sample final on the course website

July 1, 2008 Lecture 17 - Regression Testing 4

Outline

• Review of Regression coefficients• Hypothesis Tests• Confidence Intervals• Examples

July 1, 2008 Lecture 17 - Regression Testing 5

Two Continuous Variables

• Visually summarize the relationship between two continuous variables with a scatterplot

• Numerically, we focus on best fit line (regression)

Education and Mortality Draft Order and Birthday

Mortality = 1353.16 - 37.62 · Education Draft Order = 224.9 - 0.226 · Birthday

June 30, 2008 Stat 111 - Lecture 16 - Regression

6

Best values for Regression Parameters

• The best fit line has these values for the regression coefficients:

• Also can estimate the average squared residual:

Best estimate of slope

Best estimate of intercept

July 1, 2008 Lecture 17 - Regression Testing 7

Significance of Regression Line

• Does the regression line show a significant linear relationship between the two variables?• If there is not a linear relationship, then we would

expect zero correlation (r = 0)• So the slope b should also be zero

• Therefore, our test for a significant relationship will focus on testing whether our slope is significantly different from zero

H0 : = 0 versus Ha : 0

June 30, 2008 Stat 111 - Lecture 16 - Regression

8

Linear Regression

• Best fit line is called Simple Linear Regression Model:

• Coefficients:is the intercept and is the slope

• Other common notation: 0 for intercept, 1 for slope

• Our Y variable is a linear function of the X variable but we allow for error (εi) in each prediction

• We approximate the error by using the residualObserved Yi

Predicted Yi = + Xi

June 30, 2008 Stat 111 - Lecture 16 - Regression

9

Test Statistic for Slope

• Our test statistic for the slope is similar in form to all the test statistics we have seen so far:

• The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate• We will not be doing this calculation manually

because the JMP software does this calculation for us!

June 30, 2008 Stat 111 - Lecture 16 - Regression

10

Example: Education and Mortality

July 1, 2008 Lecture 17 - Regression Testing 11

Confidence Intervals for Coefficients

• JMP output also gives the information needed to make confidence intervals for slope and intercept

• 100·C % confidence interval for slope :

b +/- tn-2* SE(b)

• The multiple t* comes from a t distribution with n-2 degrees of freedom

• 100·C % confidence interval for intercept :

a +/- tn-2* SE(a)• Usually, we are less interested in intercept but it

might be needed in some situations

July 1, 2008 Lecture 17 - Regression Testing 12

Confidence Intervals for Example

• We have n = 60, so our multiple t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00

• 95 % confidence interval for slope :

-37.6 ± 2.0*8.307 = (-54.2,-21.0)Note that this interval does not contain zero!

• 95 % confidence interval for intercept :

1353± 2.0*91.42 = (1170,1536)

July 1, 2008 Lecture 17 - Regression Testing 13

Another Example: Draft Lottery• Is the negative linear association we see between

birthday and draft order statistically significant?

p-value

July 1, 2008 Lecture 17 - Regression Testing 14

Another Example: Draft Lottery

• p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order• Statistical evidence that the randomization was not done

properly!

• 95 % confidence interval for slope :

-.23±1.98*.05 = (-.33,-.13)• Multiple t* = 1.98 from t distribution with n-2 = 363 d.f.• Confidence interval does not contain zero, which we

expected from our hypothesis test

July 1, 2008 Lecture 17 - Regression Testing 15

• Dataset of 78 seventh-graders: relationship between IQ and GPA

• Clear positive association between IQ and grade point average

Education Example

July 1, 2008 Lecture 17 - Regression Testing 16

Education Example• Is the positive linear association we see between

GPA and IQ statistically significant?

p-value

July 1, 2008 Lecture 17 - Regression Testing 17

Education Example

• p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA

• 95 % confidence interval for slope :

.101±1.99*.014 = (.073,.129)

• Multiple t* = 1.99 from t distribution with n-2 = 76 d.f.• Confidence interval does not contain zero, which we

expected from our hypothesis test

July 1, 2008 Lecture 17 - Regression Testing 18

Next Class - Lecture 18

• Review of course material