Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of...

63
Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit

Transcript of Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of...

Page 1: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

Lecture 4

This week’s reading: Ch. 1

Today:Ch. 1: The Simple Regression Model• Interpretation of regression results• Goodness of fit

Page 2: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

XXnX1

Y

b1

XbbY

uXY

21

21

ˆ :line Fitted

:model True

1211̂ XbbY

1Y

b2

nY

nn XbbY 21ˆ

XbYb 21

We chose the parameters of the fitted line so as to minimize the sum of the squares of the residuals. As a result, we derived the expressions for b1 and b2.

22

XX

YYXXb

i

ii

DERIVING LINEAR REGRESSION COEFFICIENTS

Page 3: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

INTERPRETATION OF A REGRESSION EQUATION

The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade completed, for a sample of 540 respondents from the National Longitudinal Survey of Youth.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

Page 4: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

Highest grade completed means just that for elementary and high school. Grades 13, 14, and 15 mean completion of one, two and three years of college.

INTERPRETATION OF A REGRESSION EQUATION

Page 5: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

Grade 16 means completion of four-year college. Higher grades indicate years of postgraduate education.

INTERPRETATION OF A REGRESSION EQUATION

Page 6: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

. reg EARNINGS S

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------

This is the output from a regression of earnings on years of schooling, using Stata.

INTERPRETATION OF A REGRESSION EQUATION

Page 7: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

. reg EARNINGS S

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------

For the time being, we will be concerned only with the estimates of the parameters. The variables in the regression are listed in the first column and the second column gives the estimates of their coefficients.

INTERPRETATION OF A REGRESSION EQUATION

Page 8: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

. reg EARNINGS S

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------

In this case there is only one variable, S, and its coefficient is 2.46. _cons, in Stata, refers to the constant. The estimate of the intercept is -13.93.

INTERPRETATION OF A REGRESSION EQUATION

Page 9: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

Here is the scatter diagram again, with the regression line shown.

SEARNINGS 46.293.13 ^

INTERPRETATION OF A REGRESSION EQUATION

Page 10: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

What do the coefficients actually mean?

INTERPRETATION OF A REGRESSION EQUATION

Page 11: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

To answer this question, you must refer to the units in which the variables are measured.

INTERPRETATION OF A REGRESSION EQUATION

Page 12: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

S is measured in years (strictly speaking, grades completed), EARNINGS in dollars per hour. So the slope coefficient implies that hourly earnings increase by $2.46 for each extra year of schooling.

INTERPRETATION OF A REGRESSION EQUATION

Page 13: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

We will look at a geometrical representation of this interpretation. To do this, we will enlarge the marked section of the scatter diagram.

INTERPRETATION OF A REGRESSION EQUATION

Page 14: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

7

9

11

13

15

17

19

21

10.8 11 11.2 11.4 11.6 11.8 12 12.2

Years of schooling

Ho

url

y ea

rnin

gs

($)

The regression line indicates that completing 12th grade instead of 11th grade would increase earnings by $2.46, from $13.07 to $15.53, as a general tendency.

One year

$2.46$13.07

$15.53

INTERPRETATION OF A REGRESSION EQUATION

Page 15: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

You should ask yourself whether this is a plausible figure. If it is implausible, this could be a sign that your model is misspecified in some way.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

INTERPRETATION OF A REGRESSION EQUATION

Page 16: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

For low levels of education it might be plausible. But for high levels it would seem to be an underestimate.

INTERPRETATION OF A REGRESSION EQUATION

Page 17: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

What about the constant term? (Try to answer this question yourself before continuing with this sequence.)

INTERPRETATION OF A REGRESSION EQUATION

Page 18: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

Literally, the constant indicates that an individual with no years of education would have to pay $13.93 per hour to be allowed to work.

INTERPRETATION OF A REGRESSION EQUATION

Page 19: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

This does not make any sense at all. In former times craftsmen might require an initial payment when taking on an apprentice, and might pay the apprentice little or nothing for quite a while, but an interpretation of negative payment is impossible to sustain.

INTERPRETATION OF A REGRESSION EQUATION

Page 20: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

A safe solution to the problem is to limit the interpretation to the range of the sample data, and to refuse to extrapolate on the ground that we have no evidence outside the data range.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

INTERPRETATION OF A REGRESSION EQUATION

Page 21: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

With this explanation, the only function of the constant term is to enable you to draw the regression line at the correct height on the scatter diagram. It has no meaning of its own.

INTERPRETATION OF A REGRESSION EQUATION

Page 22: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Another solution is to explore the possibility that the true relationship is nonlinear and that we are approximating it with a linear regression. We will soon extend the regression technique to fit nonlinear models.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

SEARNINGS 46.293.13 ^

INTERPRETATION OF A REGRESSION EQUATION

Page 23: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

GOODNESS OF FIT

0e 0 iieXYY ˆ 0ˆ iieY

This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating three useful results. The first is that the mean value of the residuals must be zero.

Page 24: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e 0 iieXYY ˆ 0ˆ iieY

The residual in any observation is given by the difference between the actual and fitted values of Y for that observation.

iiiii XbbYYYe 21ˆ

GOODNESS OF FIT

Page 25: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e 0 iieXYY ˆ 0ˆ iieY

First substitute for the fitted value.

iiiii XbbYYYe 21ˆ ii XbbY 21

ˆ

GOODNESS OF FIT

Page 26: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e 0 iieX

iiiii XbbYYYe 21ˆ

iii XbnbYe 21

YY ˆ 0ˆ iieY

Now sum over all the observations.

GOODNESS OF FIT

Page 27: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e 0 iieX

iiiii XbbYYYe 21ˆ

iii XbnbYe 21

0)( 22

21

XbXbYY

XbbYe

iii Xn

bbYn

en

11121

YY ˆ 0ˆ iieY

Dividing through by n, we obtain the sample mean of the residuals in terms of the sample means of X and Y and the regression coefficients.

GOODNESS OF FIT

Page 28: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e 0 iieX

iiiii XbbYYYe 21ˆ

iii XbnbYe 21

0)( 22

21

XbXbYY

XbbYe XbYb 21

If we substitute for b1, the expression collapses to zero.

iii Xn

bbYn

en

11121

YY ˆ 0ˆ iieY

GOODNESS OF FIT

Page 29: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

YY ˆ 0 iieX 0ˆ iieY0e

Next we will demonstrate that the mean of the fitted values of Y is equal to the mean of the actual values of Y.

GOODNESS OF FIT

Page 30: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

iii YYe ˆ

YY ˆ 0 iieX 0ˆ iieY0e

Again, we start with the definition of a residual.

GOODNESS OF FIT

Page 31: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

iii YYe ˆ

iii YYe ˆ

YY ˆ 0 iieX 0ˆ iieY

Sum over all the observations.

0e

GOODNESS OF FIT

Page 32: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

iii YYe ˆ

iii Yn

Yn

en

ˆ111

YYe ˆ

iii YYe ˆ

YY ˆ 0 iieX 0ˆ iieY

Divide through by n. The terms in the equation are the means of the residuals, actual values of Y, and fitted values of Y, respectively.

0e

GOODNESS OF FIT

Page 33: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

iii YYe ˆ

iii Yn

Yn

en

ˆ111

YYe ˆ YY ˆ

We have just shown that the mean of the residuals is zero. Hence the mean of the fitted values is equal to the mean of the actual values.

iii YYe ˆ

0e YY ˆ 0 iieX 0ˆ iieY

GOODNESS OF FIT

Page 34: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

Next we will demonstrate that the sum of the products of the values of X and the residuals is zero.

GOODNESS OF FIT

Page 35: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

0221

21

iiii

iiiii

XbXbYX

XbbYXeX

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

We start by replacing the residual with its expression in terms of Y and X.

GOODNESS OF FIT

Page 36: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

We expand the expression.

0221

21

iiii

iiiii

XbXbYX

XbbYXeX

GOODNESS OF FIT

Page 37: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

The expression is equal to zero. One way of demonstrating this would be to substitute for b1 and b2 and show that all the terms cancel out.

0221

21

iiii

iiiii

XbXbYX

XbbYXeX

GOODNESS OF FIT

Page 38: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

A neater way is to recall the first order condition for b2 when deriving the regression coefficients. You can see that it is exactly what we need.

02220 12

22

iiii XbYXXbbRSS

0221

21

iiii

iiiii

XbXbYX

XbbYXeX

GOODNESS OF FIT

Page 39: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

Finally we will demonstrate that the sum of the products of the fitted values of Y and the residuals is zero.

GOODNESS OF FIT

Page 40: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

0

ˆ

21

21

21

ii

iii

iiii

eXbenb

eXbeb

eXbbeY

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

We start by substituting for the fitted value of Y.

GOODNESS OF FIT

Page 41: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

0

ˆ

21

21

21

ii

iii

iiii

eXbenb

eXbeb

eXbbeY

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

We expand and rearrange.

enei

GOODNESS OF FIT

Page 42: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Four useful results:

0e YY ˆ 0 iieX 0ˆ iieY

The expression is equal to zero, given the first and third useful results.

0

ˆ

21

21

21

ii

iii

iiii

eXbenb

eXbeb

eXbbeY

GOODNESS OF FIT

Page 43: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

222 ˆˆˆ

iiiii eYYeYeYYY

We now come to the discussion of goodness of fit. One measure of the variation in Y is the sum of its squared deviations around its sample mean, often described as the Total Sum of Squares, TSS.

GOODNESS OF FIT

Page 44: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

We will decompose TSS using the fact that the actual value of Y in any observations is equal to the sum of its fitted value and the residual.

GOODNESS OF FIT

Page 45: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

We substitute for Yi.

GOODNESS OF FIT

Page 46: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

YY ˆ 0e

From the useful results, the mean of the fitted values of Y is equal to the mean of the actual values. Also, the mean of the residuals is zero.

GOODNESS OF FIT

Page 47: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

Hence we can simplify the expression as shown.

YY ˆ 0e

GOODNESS OF FIT

Page 48: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

iiiii

iiiii

eYeYeYY

eYYeYYYY

2ˆ2ˆ

ˆ2ˆ

22

222

We expand the squared terms on the right side of the equation.

GOODNESS OF FIT

Page 49: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

iiiii

iiiii

eYeYeYY

eYYeYYYY

2ˆ2ˆ

ˆ2ˆ

22

222

We expand the third term on the right side of the equation.

GOODNESS OF FIT

Page 50: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

iiiii

iiiii

eYeYeYY

eYYeYYYY

2ˆ2ˆ

ˆ2ˆ

22

222

The last two terms are both zero, given the first and fourth useful results.

0ˆ iieY so ,0e

0 ie

GOODNESS OF FIT

Page 51: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

222 ˆˆˆ

iiiii eYYeYeYYY

iiiii

iiiii

eYeYeYY

eYYeYYYY

2ˆ2ˆ

ˆ2ˆ

22

222

222 ˆiii eYYYY RSSESSTSS

Thus we have shown that TSS, the total sum of squares of Y can be decomposed into ESS, the ‘explained’ sum of squares, and RSS, the residual (‘unexplained’) sum of squares.

GOODNESS OF FIT

Page 52: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

iiiiii eYYYYe ˆˆ

The words explained and unexplained were put in quotation marks because the explanation may in fact be false. Y might really depend on some other variable Z, and X might be acting as a proxy for Z. It would be safer to use the expression apparently explained instead of explained.

222 ˆˆˆ

iiiii eYYeYeYYY

iiiii

iiiii

eYeYeYY

eYYeYYYY

2ˆ2ˆ

ˆ2ˆ

22

222

222 ˆiii eYYYY RSSESSTSS

GOODNESS OF FIT

Page 53: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

2

22

)(

)ˆ(

YY

YY

TSSESS

Ri

i

The main criterion of goodness of fit, formally described as the coefficient of determination, but usually referred to as R2, is defined to be the ratio of ESS to TSS, that is, the proportion of the variance of Y explained by the regression equation.

222 ˆiii eYYYY RSSESSTSS

GOODNESS OF FIT

Page 54: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Obviously we would like to locate the regression line so as to make the goodness of fit as high as possible, according to this criterion. Does this objective clash with our use of the least squares principle to determine b1 and b2?

2

22

)(

)ˆ(

YY

YY

TSSESS

Ri

i

222 ˆiii eYYYY RSSESSTSS

GOODNESS OF FIT

Page 55: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Fortunately, there is no clash. To see this, rewrite the expression for R2 in term of RSS as shown.

2

2

2

)(1

YY

e

TSSRSSTSS

Ri

i

2

22

)(

)ˆ(

YY

YY

TSSESS

Ri

i

222 ˆiii eYYYY RSSESSTSS

GOODNESS OF FIT

Page 56: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

2

2

2

)(1

YY

e

TSSRSSTSS

Ri

i

2

22

)(

)ˆ(

YY

YY

TSSESS

Ri

i

The OLS regression coefficients are chosen in such a way as to minimize the sum of the squares of the residuals. Thus it automatically follows that they maximize R2.

222 ˆiii eYYYY RSSESSTSS

GOODNESS OF FIT

Page 57: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Another natural criterion of goodness of fit is the correlation between the actual and fitted values of Y. We will demonstrate that this is maximized by using the least squares principle to determine the regression coefficients

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

GOODNESS OF FIT

Page 58: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

We will start with the numerator and substitute for the actual value of Y, and its mean, in the first factor.

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

2

2

ˆ

ˆˆ

ˆˆ

ˆˆˆ

YY

eYYeYY

YYeYY

YYeYeYYYYY

i

iiii

iii

iiiii

GOODNESS OF FIT

Page 59: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

We rearrange a little.

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

2

2

ˆ

ˆˆ

ˆˆ

ˆˆˆ

YY

eYYeYY

YYeYY

YYeYeYYYYY

i

iiii

iii

iiiii

GOODNESS OF FIT

Page 60: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

We expand the expression The last two terms are both zero (fourth and first useful results).

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

2

2

ˆ

ˆˆ

ˆˆ

ˆˆˆ

YY

eYYeYY

YYeYY

YYeYeYYYYY

i

iiii

iii

iiiii

so ,0e

0 ie

0ˆ iieY

GOODNESS OF FIT

Page 61: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

Thus the numerator simplifies to the sum of the squared deviations of the fitted values.

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

2

2

ˆ

ˆˆ

ˆˆ

ˆˆˆ

YY

eYYeYY

YYeYY

YYeYeYYYYY

i

iiii

iii

iiiii

0ˆ iieY so ,0e

0 ie

GOODNESS OF FIT

Page 62: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

We have the same expression in the denominator, under a square root. Cancelling, we are left with the square root in the numerator.

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

GOODNESS OF FIT

Page 63: Lecture 4 This week’s reading: Ch. 1 Today: Ch. 1: The Simple Regression Model Interpretation of regression results Goodness of fit.

© Christopher Dougherty 1999–2006

22

2

2

2

22

2

22ˆ,

ˆˆ

ˆ

ˆ

ˆ

ˆ

RYY

YY

YY

YY

YYYY

YY

YYYY

YYYYr

i

i

i

i

ii

i

ii

ii

YY

Thus the correlation coefficient is the square root of R2. It follows that it is maximized by the use of the least squares principle to determine the regression coefficients.

GOODNESS OF FIT