01_03

19
Least squares estimation of regression lines Regression via least squares Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

description

Linear Regression by r

Transcript of 01_03

Page 1: 01_03

Least squares estimation of regression linesRegression via least squares

Brian Caffo, Jeff Leek and Roger PengJohns Hopkins Bloomberg School of Public Health

Page 2: 01_03

General least squares for linear equationsConsider again the parent and child height data from Galton

2/19

Page 3: 01_03

Fitting the best lineLet be the child's height and be the (average over the pair of) parents' heights.

Consider finding the best line

Use least squares

How do we do it?

∙ Yi ith Xi ith

Child's Height = + Parent's Height ­ β0 β1

− ( + )∑i=1

n

Yi β0 β1Xi 2

3/19

Page 4: 01_03

Let's solve this problem generallyLet and our estimates be .

We want to minimize

Suppose that

then

∙ = +μi β0 β1Xi = +μi β0 β1Xi

† ( − = ( − + 2 ( − )( − ) + ( −∑i=1

n

Yi μi)2 ∑

i=1

n

Yi μi)2 ∑

i=1

n

Yi μi μi μi ∑i=1

n

μi μi)2

( − )( − ) = 0∑i=1

n

Yi μi μi μi

† = ( − + ( − ≥ ( −∑i=1

n

Yi μi)2 ∑

i=1

n

μi μi)2 ∑

i=1

n

Yi μi)2

4/19

Page 5: 01_03

Mean only regressionSo we know that if:

where and then the line

is the least squares line.

Consider forcing and thus ; that is, only considering horizontal lines

The solution works out to be

( − )( − ) = 0∑i=1

n

Yi μi μi μi

= +μi β0 β1Xi = +μi β0 β1Xi

Y = + Xβ0 β1

∙ = 0β1 = 0β1

= .β0 Y

5/19

Page 6: 01_03

Let's show it

Thus, this will equal 0 if

Thus

( − )( − ) =∑i=1

n

Yi μi μi μi

=

( − )( − )∑i=1

n

Yi β0 β0 β0

( − ) ( − ) β0 β0 ∑i=1

n

Yi β0

( − ) = n − n = 0∑ni=1 Yi β0 Y β0

= .β0 Y

6/19

Page 7: 01_03

Regression through the originRecall that if:

where and then the line

is the least squares line.

Consider forcing and thus ; that is, only considering lines through the origin

The solution works out to be

( − )( − ) = 0∑i=1

n

Yi μi μi μi

= +μi β0 β1Xi = +μi β0 β1Xi

Y = + Xβ0 β1

∙ = 0β0 = 0β0

= .β1∑i=1n YiXi

∑ni=1 X

2i

7/19

Page 8: 01_03

Let's show it

Thus, this will equal 0 if

Thus

( − )( − ) =∑i=1

n

Yi μi μi μi

=

( − )( − )∑i=1

n

Yi β1Xi β1Xi β1Xi

( − ) ( − ) β1 β1 ∑i=1

n

YiXi β1X2i

( − ) = − = 0∑ni=1 YiXi β1X2

i ∑ni=1 YiXi β1 ∑

ni=1 X

2i

= .β1∑i=1n YiXi

∑ni=1 X

2i

8/19

Page 9: 01_03

Recapping what we knowIf we define then .

If we define then

What about when ? That is, we don't want to restrict ourselves to horizontal lines orlines through the origin.

∙ =μi β0 =β0 Y

If we only look at horizontal lines, the least squares estimate of the intercept of that line is theaverage of the outcomes.

­

∙ =μi Xiβ1 =β1∑ i=1n YiX i

∑ni=1 X

2i

If we only look at lines through the origin, we get the estimated slope is the cross product of theX and Ys divided by the cross product of the Xs with themselves.

­

∙ = +μi β0 β1Xi

9/19

Page 10: 01_03

Let's figure it out

Note that

Then

( − )( − ) =∑i=1

n

Yi μi μi μi

=

( − − )( + − − )∑i=1

n

Yi β0 β1Xi β0 β1Xi β0 β1Xi

( − ) ( − − ) + ( − ) ( − − )β0 β0 ∑i=1

n

Yi β0 β1Xi β1 β1 ∑i=1

n

Yi β0 β1Xi Xi

0 = ( − − ) = n − n − n implies that = −∑i=1

n

Yi β0 β1Xi Y β0 β1X β0 Y β1X

( − − ) = ( − + − )∑i=1

n

Yi β0 β1Xi Xi ∑i=1

n

Yi Y β1X β1Xi Xi

10/19

Page 11: 01_03

Continued

And thus

So we arrive at

And recall

= ( − ) − ( − )∑i=1

n

Yi Y β1 Xi X Xi

( − ) − ( − ) = 0.∑i=1

n

Yi Y Xi β1 ∑i=1

n

Xi X Xi

= = = Cor(Y, X) .β1( − )∑n

i=1 Yi Y Xi

( − )∑ni=1 Xi X Xi

( − )( − )∑ni=1 Yi Y Xi X

( − )( − )∑ni=1 Xi X Xi X

Sd(Y)Sd(X)

= − .β0 Y β1X

11/19

Page 12: 01_03

ConsequencesThe least squares model fit to the line through the data pairs with as theoutcome obtains the line where

has the units of , has the units of .

The line passes through the point )

The slope of the regression line with as the outcome and as the predictor is .

The slope is the same one you would get if you centered the data, , and didregression through the origin.

If you normalized the data, , the slope is .

∙ Y = + Xβ0 β1 ( , )Xi Yi Yi

Y = + Xβ0 β1

= Cor(Y, X) = −β1Sd(Y)Sd(X)

β0 Y β1X

∙ β1 Y/X β0 Y

∙ ( ,X Y

∙ X YCor(Y, X)Sd(X)/Sd(Y)

∙ ( − , − )Xi X Yi Y

∙ , −X i XSd(X)

−Yi YSd(Y) Cor(Y, X)

12/19

Page 13: 01_03

Revisiting Galton's dataDouble check our calculations using R

y <- galton$childx <- galton$parentbeta1 <- cor(y, x) * sd(y) / sd(x)beta0 <- mean(y) - beta1 * mean(x)rbind(c(beta0, beta1), coef(lm(y ~ x)))

(Intercept) x[1,] 23.94 0.6463[2,] 23.94 0.6463

13/19

Page 14: 01_03

Revisiting Galton's dataReversing the outcome/predictor relationship

beta1 <- cor(y, x) * sd(x) / sd(y)beta0 <- mean(x) - beta1 * mean(y)rbind(c(beta0, beta1), coef(lm(x ~ y)))

(Intercept) y[1,] 46.14 0.3256[2,] 46.14 0.3256

14/19

Page 15: 01_03

Revisiting Galton's dataRegression through the origin yields an equivalent slope if you center the

data first

yc <- y - mean(y)xc <- x - mean(x)beta1 <- sum(yc * xc) / sum(xc 2)c(beta1, coef(lm(y ~ x))[2])

x 0.6463 0.6463

15/19

Page 16: 01_03

Revisiting Galton's dataNormalizing variables results in the slope being the correlation

yn <- (y - mean(y))/sd(y)xn <- (x - mean(x))/sd(x)c(cor(y, x), cor(yn, xn), coef(lm(yn ~ xn))[2])

xn 0.4588 0.4588 0.4588

16/19

Page 17: 01_03

Plotting the fitSize of points are frequencies at that X, Y combination.

For the red lie the child is outcome.

For the blue, the parent is the outcome (accounting for the fact that the response is plotted on thehorizontal axis).

Black line assumes (slope is ).

Big black dot is .

∙ Cor(Y, X) = 1 Sd(Y)/Sd(x)

∙ ( , )X Y

17/19

Page 18: 01_03

The code to add the lines

abline(mean(y) - mean(x) * cor(y, x) * sd(y) / sd(x), sd(y) / sd(x) * cor(y, x), lwd = 3, col = "red")abline(mean(y) - mean(x) * sd(y) / sd(x) / cor(y, x), sd(y) cor(y, x) / sd(x), lwd = 3, col = "blue")abline(mean(y) - mean(x) * sd(y) / sd(x), sd(y) / sd(x), lwd = 2)points(mean(x), mean(y), cex = 2, pch = 19)

18/19

Page 19: 01_03

19/19