36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6...

26
36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 50 150 300 50 150 300 50 150 300 50 150 300 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 50 150 300 0 5 10 15 20 50 150 300 Given : Chick Time Weight Figure 1: Weight Progression of 50 Chicks in the ChickWeight Data Set 1

Transcript of 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6...

Page 1: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

36-401 Modern Regression HW #6 SolutionsDUE: 10/27/2017 at 3PM

Problem 1 [32 points]

(a) (4 pts.)

5015

030

0

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

5015

030

0

5015

030

0

5015

030

0

5015

030

0

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

5015

030

0

0 5 10 15 20

5015

030

0

Given : Chick

Time

Wei

ght

Figure 1: Weight Progression of 50 Chicks in the ChickWeight Data Set

1

Page 2: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

(b) (9 pts.)

The results of a naive simple linear fit of Chick Weight on Time are shown in Figure 2. The right panel showsvery significant residual correlation. Therefore, the model does not fit well.

0 5 10 15 20

5075

100

125

150

Wei

ght

Time TimeR

esid

uals

0 5 10 15 20

−20

−10

010

20

Figure 2: Results of Naive Linear Regression of Chick 6 Weight on Time

Figure 3 displays the results of a degree four1 polynomial regression of Weight on Time. The lack of visiblecorrelation in the residuals suggests this is a much more suitabe fit.

0 5 10 15 20

5075

100

125

150

Wei

ght

Time Time

Res

idua

ls

0 5 10 15 20

0

Figure 3: Results of Polynomial Regression of Chick 6 Weight on Time

1We chose this heuristically.

2

Page 3: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Table 1: Summary of Polynomial Regression for Chick 6

Estimate Std. Error t value Pr(>|t|)(Intercept) 42.7758402 3.2118729 13.318036 0.0000032Time -2.6030894 2.3499713 -1.107711 0.3045914I(Timeˆ2) 2.1242507 0.4898299 4.336711 0.0034098I(Timeˆ3) -0.1324975 0.0359324 -3.687415 0.0077831I(Timeˆ4) 0.0023642 0.0008494 2.783231 0.0271716

Not surprisingly, given the polynomial shape of the weight progression of Chick 6, our polynomial regressiondoes a much better job fitting the data than the naive linear regression. The most notable aspect of ourmodel parameters is that Time is no longer statistically significant given its higher order terms.

(c) (9 pts.)

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time Time

Res

idua

ls

0 5 10 15 20

−120

−80

−40

040

8012

016

0

Figure 4: Results of Naive Linear Regression of Chick Weight on Time

As we saw with Chick 6, and as we could infer from Figure 1, the weight progression of individual chicks isoften more suitable to model with a polynomial than a simple linear regression. However, when the data areaggregated over all chicks, a simple linear model is actually not a poor choice. The model fit and residualsare shown in Figure 4. For the most part the line fits the data fairly well. However, some residual curvatureis apparent in the earlier stages of the chick lifetime. Therefore, adding a quadratic term will likely benefitthe model. Figure 5 shows that the extra term does indeed level out the residuals nicely.

3

Page 4: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time Time

Res

idua

ls

0 5 10 15 20

−120

−80

−40

040

8012

016

0

Figure 5: Results of Polynomial Regression of Chick Weight on Time

(d) (9 pts.)

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time

Diet 1Diet 2Diet 3Diet 4

Figure 6: Results of Polynomial Regression of Chick Weight on Time and Diet

4

Page 5: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Simply adding Diet as a predictor without interacting it with Time amounts to the growth curve for eachdiet group having the same shape but possibly different intercepts. As we can see in Figure 6, such a modelspecification is not suitable in this setting. Given the HW instructions, it is fine if you do this as long as youcomment on the ill-suitedness of the model.

Since a chick’s birthweight is independent of the diet it is thereafter fed, it is more sensible to force eachdiet group to have the same intercept, but possibly differing curvatures. The fits of this model are shown inFigure 7 (superposed) and Figure 8 (separated into respective Diet groups).

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time

Diet 1Diet 2Diet 3Diet 4

Figure 7: Polynomial Regression of Chick Weight with Timeˆ2 and Diet Interacted

5

Page 6: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time

Diet = 1

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time

Diet = 2

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time

Diet = 3

0 5 10 15 20

5010

015

020

025

030

035

0

Wei

ght

Time

Diet = 4

Figure 8: Polynomial Regression of Chick Weight with Timeˆ2 and Diet Interacted

The residuals of the model are plotted vs. Time in Figure 9 (superposed) and Figure 10 (separated intorespective Diet groups). The residuals look pretty healthy overall, but there is some clear correlation in Dietgroups 1 and 4. A box plot can also be used to do diet-wise diagnostics of the residuals (see Figure 11).However, the significant heteroskedasticity in all diet groups renders this plot next to useless. Instead we aremuch better off examining Figure 10. Notice we are only a few predictor interactions shy of performing fourseparate regressions any way.

6

Page 7: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Time

Res

idua

ls

0 5 10 15 20

−120

−80

−40

040

8012

0 Diet 1Diet 2Diet 3Diet 4

Figure 9: Residuals of chosen model

7

Page 8: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

0 5 10 15 20

−120

−80

−40

040

8012

0

Res

idua

ls

Time

Diet = 1

0 5 10 15 20

−120

−80

−40

040

8012

0

Res

idua

ls

Time

Diet = 2

0 5 10 15 20

−120

−80

−40

040

8012

0

Res

idua

ls

Time

Diet = 3

0 5 10 15 20

−120

−80

−40

040

8012

0

Res

idua

ls

Time

Diet = 4

Figure 10: Residuals of chosen model

8

Page 9: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

1 2 3 4

−15

0−

100

−50

050

100

Diet

Res

idua

ls

Figure 11: Diet-wise Distribution of Residuals

9

Page 10: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Problem 2 [36 points]

(a) (9 pts.)

There are about 100 ways to do this. Here is one.

Let

X0 =

1...1

,

i.e. the intercept predictor, andX =

[X0 X1 X2 X3

].

Now notice X0 = X1 +X2 +X3. That is, the columns of X are linearly dependent. Now,

Columns of X are LD =⇒ det(X) = 0.

And,det(XTX) = det(XT ) · det(X) = det(X) · det(X) = 02 = 0.

det(XTX) = 0 so XTX is not invertible.

(b) (9 pts.)

Y <- c(33,36,35,35,31,29,31,29,37,39,36,36)X0 <- rep(1,12)X1 <- c(rep(1,4),rep(0,8))X2 <- c(rep(0,4),rep(1,4),rep(0,4))X3 <- c(rep(0,8),rep(1,4))X <- cbind(X0,X1,X2) # leave out X3

model7 <- lm(Y ~ X - 1) # leave out default intercept since we have already included onetmp <- summary(model7)$coefficientsrownames(tmp) <- c("(Intercept)","France","Italy")kable(tmp, caption = "Income Regression Summary")

Table 2: Income Regression Summary

Estimate Std. Error t value Pr(>|t|)(Intercept) 37.00 0.6400955 57.803877 0.0000000France -2.25 0.9052317 -2.485552 0.0346742Italy -7.00 0.9052317 -7.732827 0.0000290

All coefficients in the regression are significant, signalling a statistically significant difference between themean incomes of France, Italy, and the USA.

10

Page 11: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

(c) (9 pts.)

As we defined the model in part (b), we have

E[Income | France] = β0 + β1 = 34.75 (thousand)

E[Income | Italy] = β0 + β2 = 30.00 (thousand)

E[Income | USA] = β0 = 37.00 (thousand)

(d) (9 pts.)

Continuing with the same notation, the true mean income of France is given by β0 + β1. Assuming the noiseis normally distributed implies

β0 ∼ N(β0, Var(β0)

)and β1 ∼ N

(β1, Var(β1)

)and thus

β0 + β1 ∼ N(β0 + β1, Var(β0 + β1)

)N(β0 + β1, Var(β0) + Var(β1) + 2 · Cov(β0 + β1)

).

Therefore, a 95% confidence interval for β0 + β1 (mean income of France) is given by

(β0 + β1)± z0.025 ·√

Var(β0 + β1).

Var(β0 + β1) is not known so we replace it with its unbiased estimator and use the t-distribution. This yieldsthe 95% confidence interval

(β0 + β1)± tn−1(0.025) ·√Var(β0 + β1)

=⇒ (β0 + β1)± t3(0.025) ·√

Var(β0) + Var(β1) + 2 · Cov(β0, β1)=⇒ 34.75± 2.037,

where we have gotten the estimated errors from the vcov function.

11

Page 12: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Problem 3 [32 points]

(a) [5 pts.]

age

100 200 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6

1525

3545

100

200

lwt

race

1.0

2.0

3.0

0.0

0.4

0.8

smoke

ptl

0.0

1.5

3.0

0.0

0.4

0.8

ht

ui

0.0

0.4

0.8

02

46

ftv

15 25 35 45 1.0 2.0 3.0 0.0 1.5 3.0 0.0 0.4 0.8 1000 4000

1000

4000

bwt

12

Page 13: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

(b) [9 pts.]

Table 3: Summary of Birth Weight Regression

Estimate Std. Error t value Pr(>|t|)(Intercept) 2841.040247 328.894986 8.6381379 0.0000000age -2.955912 9.604270 -0.3077706 0.7586271lwt 4.380344 1.834985 2.3871276 0.0180591factor(race)2 -442.329304 148.926061 -2.9701269 0.0033999factor(race)3 -282.141331 120.748135 -2.3366103 0.0206050smoke -283.589210 111.748150 -2.5377531 0.0120397factor(ptl)1 -357.092406 152.324861 -2.3442818 0.0201992factor(ptl)2 -68.778992 297.501669 -0.2311886 0.8174415factor(ptl)3 1260.027892 661.664516 1.9043305 0.0585268ht -553.036031 202.192322 -2.7351980 0.0068840ui -522.713818 138.164206 -3.7832796 0.0002130factor(ftv)1 142.479511 122.824933 1.1600211 0.2476386factor(ftv)2 -1.868773 138.659587 -0.0134774 0.9892624factor(ftv)3 -319.779655 253.984296 -1.2590529 0.2097074factor(ftv)4 244.941583 336.965702 0.7269036 0.4682675factor(ftv)6 15.369792 700.858349 0.0219300 0.9825291

(c) [9 pts.]

Overall, the residuals look quite good plotted against the fitted values and the numerical predictors. Case226 (a 45 year old) bears substantial leverage on the fit since all other cases are much younger. Furthermore,this data point also yields a very large outlier. In practice, we would therefore discard this observation whenbuilding a predictive model. In regard to the categorical predictors, there is some interclass heteroskedasticity;however, the varied sample sizes can also make is challenging to deduce this strictly from these box plots.That is, categorical levels containing a large number of observations can give the appearance of a widerdistribution while levels with small sample sizes natural appear to have low variance. To normalize thesecomparisons with respect to sample size we could, for example, perform an F -test for equality of variances or(more sophisticated) a Kolmorogov-Smirnov test for equality of distributions.

13

Page 14: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Fitted values

Res

idua

ls

lm(bwt ~ age + lwt + factor(race) + smoke + factor(ptl) + ht + ui + factor( ...

10

226

16

1800 2200 2600 3000 3400 3800

−210

0−1

200

−300

300

900

1500

Age

Res

idua

ls

15 20 25 30 35 40 45

−180

0−9

00−3

0030

090

015

00

14

Page 15: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Age

Res

idua

ls

80 100 120 140 160 180 200 220 240

−180

0−9

00−3

0030

090

015

00

1 2 3

−15

00−

500

050

015

00

Race

Res

idua

ls

15

Page 16: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

0 1

−15

00−

500

050

015

00

Smoke

Res

idua

ls

0 1 2 3

−15

00−

500

050

015

00

Number of previous premature labours

Res

idua

ls

16

Page 17: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

0 1

−15

00−

500

050

015

00

History of hypertension

Res

idua

ls

0 1

−15

00−

500

050

015

00

Presence of uterine irritability

Res

idua

ls

17

Page 18: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

0 1 2 3 4 6

−15

00−

500

050

015

00

Number of physician visits during the first trimester.

Res

idua

ls

(d) [9 pts.]

Table 3 (part b) suggests we may have included more predictors than are useful for predicting Birth Weight.A more predictive model would likely reduce the dimension of our covariates. We could, for example, proceedby performing a sequence of partial F -tests (equivalently stepwise regression; 36-402) or perhaps implementinga LASSO regression (also 36-402)2 Given the full set of predictors, the multiple linear regression suggeststhat a baby’s birth weight is significantly associated with the mother’s weight, race, smoking status, historyof hypertension, and presence of uterine iritability (and possibly the number of previous premature labours).

2Gee, I bet you guys can’t wait until 36-402.

18

Page 19: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Appendix

addTrans <- function(color,trans){

# This function adds transparancy to a color.# Define transparancy with an integer between 0 and 255# 0 being fully transparant and 255 being fully visable# Works with either color and trans a vector of equal length,# or one of the two of length 1.

if (length(color)!=length(trans)&!any(c(length(color),length(trans))==1)){stop("Vector lengths not correct")

}if (length(color)==1 & length(trans)>1) color <- rep(color,length(trans))if (length(trans)==1 & length(color)>1) trans <- rep(trans,length(color))

num2hex <- function(x){

hex <- unlist(strsplit("0123456789ABCDEF",split=""))return(paste(hex[(x-x%%16)/16+1],hex[x%%16+1],sep=""))

}rgb <- rbind(col2rgb(color),trans)res <- paste("#",apply(apply(rgb,2,num2hex),2,paste,collapse=""),sep="")return(res)

}

Problem 1 [32 points]

data(ChickWeight)names(ChickWeight)attach(ChickWeight)

(a) (4 pts.)

coplot(weight ~ Time | Chick, data = ChickWeight,type = "b", show.given = FALSE,xlab = "", ylab = "", main = "")

mtext(side = 1, text = "Time", cex = 2, line = 3.75)mtext(side = 2, text = "Weight", cex = 2, line = 2.25)

(b) (9 pts.)

Chick6 <- ChickWeight[ChickWeight$Chick == 6,]

par(mfrow=c(1,2))with(Chick6, plot(Time, weight, col = NA, pch = 19, cex = 1.25,

axes = FALSE, xlab="",ylab=""))axis(side = 1, at = c(0,5,10,15,20), as.character(c(0,5,10,15,20)),

font = 5)

19

Page 20: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

axis(side = 2, at = seq(0,300,25), as.character(seq(0,300,25)),font = 5)

abline(v = c(0,5,10,15,20), h = seq(50,300,25), col = "gray70",lty = 2)

mtext(side = 2, text = "Weight", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)with(Chick6, points(Time, weight, col = addTrans("orange",120),

cex = 1.25, pch = 19))with(Chick6, points(Time, weight, col = "orange", pch = 1,

cex = 1.25))model1 <- lm(weight ~ Time, data = Chick6)abline(model1, col = "seagreen", lwd = 2)

plot(Chick6$Time, residuals(model1), col = NA, axes = FALSE,xlab= "Time", ylab = "Residuals", font.lab = 3)

axis(side = 1, at = c(0,5,10,15,20), as.character(c(0,5,10,15,20)),font = 5)

axis(side = 2, at = seq(-70,30,10), labels = as.character(seq(-70,30,10)),font = 5)

abline(h = seq(-70,30,10), v = c(0,5,10,15,20), col = "gray70",lty = 2)

abline(0,0, lty = 2, col = "gray45")points(Chick6$Time, residuals(model1), col = addTrans("orange",120),

pch = 19, cex = 1.25)points(Chick6$Time, residuals(model1), col = "orange", cex = 1.25)panel.smooth(Chick6$Time, residuals(model1), col = NA,cex = 0.5,

col.smooth = "seagreen", span = 0.5, iter = 3)

par(mfrow=c(1,2))with(Chick6, plot(Time, weight, col = NA, pch = 19, cex = 1.25,

axes = FALSE, xlab="",ylab=""))axis(side = 1, at = c(0,5,10,15,20), as.character(c(0,5,10,15,20)),

font = 5)axis(side = 2, at = seq(0,300,25), as.character(seq(0,300,25)),

font = 5)abline(v = c(0,5,10,15,20), h = seq(50,300,25), col = "gray70",

lty = 2)mtext(side = 2, text = "Weight", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)with(Chick6, points(Time, weight, col = addTrans("orange",120),

cex = 1.25, pch = 19))with(Chick6, points(Time, weight, col = "orange", pch = 1, cex = 1.25))model2 <- lm(weight ~ Time + I(Time ^ 2) + I(Time ^ 3) + I(Time ^ 4),

data = Chick6)lines(Chick6$Time, fitted(model2), col = "seagreen", lwd = 2)

plot(Chick6$Time, residuals(model2), col = NA, axes = FALSE, xlab= "Time",ylab = "Residuals", font.lab = 3)

axis(side = 1, at = c(0,5,10,15,20), as.character(c(0,5,10,15,20)),font = 5)

axis(side = 2, at = seq(-70,30,10), labels = as.character(seq(-70,30,10)),font = 5)

abline(h = seq(-70,30,10), v = c(0,5,10,15,20), col = "gray70", lty = 2)abline(0,0, lty = 2, col = "gray45")

20

Page 21: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

points(Chick6$Time, residuals(model2), col = addTrans("orange",120),pch = 19, cex = 1.25)

points(Chick6$Time, residuals(model2), col = "orange", cex = 1.25)panel.smooth(Chick6$Time, residuals(model2), col = NA,cex = 0.5,

col.smooth = "seagreen", span = 0.5, iter = 3)

library(knitr)kable(summary(model2)$coefficients, caption = "Summary of Polynomial Regression for Chick 6")

(c) (9 pts.)

par(mfrow=c(1,2))with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,

axes = FALSE, xlab="",ylab=""))axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),

font = 5)axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)),

font = 5)abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70",

lty = 2)mtext(side = 2, text = "Weight", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)with(ChickWeight, points(Time, weight, col = addTrans("orange",120),

cex = 0.8, pch = 19))with(ChickWeight, points(Time, weight, col = "orange", pch = 1,

cex = 0.8))model3 <- lm(weight ~ Time, data = ChickWeight)abline(model3, col = "seagreen", lwd = 2)

plot(ChickWeight$Time, residuals(model3), col = NA, axes = FALSE,xlab= "Time", ylab = "Residuals", font.lab = 3)

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),font = 5)

axis(side = 2, at = seq(-200,200,40), labels = as.character(seq(-200,200,40)),font = 5)

abline(h = seq(-200,200,40), v = c(0,5,10,15,20,25), col = "gray70",lty = 2)

abline(0,0, lty = 2, col = "gray45")points(ChickWeight$Time, residuals(model3), col = addTrans("orange",120),

pch = 19, cex = 0.8)points(ChickWeight$Time, residuals(model3), col = "orange", cex = 0.8)panel.smooth(ChickWeight$Time, residuals(model3), col = NA,cex = 0.5,

col.smooth = "seagreen", span = 0.5, iter = 3)

par(mfrow=c(1,2))with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,

axes = FALSE, xlab="",ylab=""))axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),

font = 5)axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)), font = 5)abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70", lty = 2)mtext(side = 2, text = "Weight", font = 3, line = 3)

21

Page 22: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

mtext(side = 1, text = "Time", font = 3, line = 3)with(ChickWeight, points(Time, weight, col = addTrans("orange",120), cex = 0.8, pch = 19))with(ChickWeight, points(Time, weight, col = "orange", pch = 1, cex = 0.8))model4 <- lm(weight ~ Time + I(Time ^ 2), data = ChickWeight)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model4, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50))),col = "seagreen", lwd = 2)

plot(ChickWeight$Time, residuals(model4), col = NA, axes = FALSE, xlab= "Time",ylab = "Residuals", font.lab = 3)

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)), font = 5)axis(side = 2, at = seq(-200,200,40), labels = as.character(seq(-200,200,40)),

font = 5)abline(h = seq(-200,200,40), v = c(0,5,10,15,20,25), col = "gray70", lty = 2)abline(0,0, lty = 2, col = "gray45")points(ChickWeight$Time, residuals(model4), col = addTrans("orange",120),

pch = 19, cex = 0.8)points(ChickWeight$Time, residuals(model4), col = "orange", cex = 0.8)panel.smooth(ChickWeight$Time, residuals(model4), col = NA,cex = 0.5,

col.smooth = "seagreen", span = 0.5, iter = 3)

(d) (9 pts.)

with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,axes = FALSE, xlab="",ylab=""))

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)), font = 5)axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)), font = 5)abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70", lty = 2)mtext(side = 2, text = "Weight", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)with(ChickWeight, points(Time, weight, col = addTrans("orange",120),

cex = 0.8, pch = 19))with(ChickWeight, points(Time, weight, col = "orange", pch = 1, cex = 0.8))model5 <- lm(weight ~ Time + I(Time ^ 2) + factor(Diet), data = ChickWeight)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model5, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 1)),

col = "seagreen", lwd = 2)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model5, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 2)),

col = "red", lwd = 2)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model5, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 3)),

col = "blue", lwd = 2)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model5, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 4)),

col = "purple", lwd = 2)legend(x = "topleft", legend = c("Diet 1", "Diet 2", "Diet 3", "Diet 4"),

col = c("seagreen","red","blue","purple"), lwd = 2)

22

Page 23: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8, axes = FALSE,xlab="",ylab=""))

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)), font = 5)axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)), font = 5)abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70", lty = 2)mtext(side = 2, text = "Weight", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)with(ChickWeight, points(Time, weight, col = addTrans("orange",120), cex = 0.8,

pch = 19))with(ChickWeight, points(Time, weight, col = "orange", pch = 1, cex = 0.8))model6 <- lm(weight ~ Time + I(Time ^ 2) : factor(Diet), data = ChickWeight)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 1)),

col = "seagreen", lwd = 2)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 2)),

col = "red", lwd = 2)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 3)),

col = "blue", lwd = 2)lines(seq(0,max(ChickWeight$Time),length = 50),

predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),Diet = 4)),

col = "purple", lwd = 2)legend(x = "topleft", legend = c("Diet 1", "Diet 2", "Diet 3", "Diet 4"),

col = c("seagreen","red","blue","purple"), lwd = 2, bg = "white")

par(mfrow=c(2,2))cols <- c("seagreen","red","blue","purple")for (itr in 1:4){

with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,axes = FALSE, xlab="",ylab=""))

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),font = 5)

axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)),font = 5)

abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70",lty = 2)

mtext(side = 2, text = "Weight", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)mtext(side = 3, text = paste0("Diet = ",itr), font = 3)with(ChickWeight[which(ChickWeight$Diet==itr),], points(Time, weight,

col = addTrans("orange",120),cex = 0.8, pch = 19))

with(ChickWeight[which(ChickWeight$Diet==itr),], points(Time, weight,col = "orange",pch = 1, cex = 0.8))

model6 <- lm(weight ~ Time + I(Time ^ 2) : factor(Diet),data = ChickWeight)

lines(seq(0,max(ChickWeight$Time),length = 50),predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),

23

Page 24: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

length = 50), Diet = itr)),col = cols[itr], lwd = 2)

}

plot(ChickWeight$Time, residuals(model6), col = NA, axes = FALSE,xlab= "Time", ylab = "Residuals", font.lab = 3)

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),font = 5)

axis(side = 2, at = seq(-200,200,40), labels = as.character(seq(-200,200,40)),font = 5)

abline(h = seq(-200,200,40), v = c(0,5,10,15,20,25), col = "gray70", lty = 2)abline(0,0, lty = 2, col = "gray45")points(ChickWeight$Time, residuals(model6),

col = addTrans(c("seagreen","red","blue","purple")[ChickWeight$Diet],120),pch = 19, cex = 0.8)

points(ChickWeight$Time, residuals(model6),col = c("seagreen","red","blue","purple")[ChickWeight$Diet], cex = 0.8)

legend(x = "topleft", legend = c("Diet 1", "Diet 2", "Diet 3", "Diet 4"),col = addTrans(c("seagreen","red","blue","purple"),120), pch = 19, bg = "white")

par(mfrow=c(2,2))cols <- c("seagreen","red","blue","purple")for (itr in 1:4){

with(ChickWeight, plot(Time, residuals(model6), col = NA,pch = 19, cex = 0.8, axes = FALSE, xlab="",ylab=""))

axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),font = 5)

axis(side = 2, at = seq(-200,200,40), labels = as.character(seq(-200,200,40)),font = 5)

abline(h = seq(-200,200,40), v = c(0,5,10,15,20,25), col = "gray70", lty = 2)abline(0,0, lty = 2, col = "gray45")mtext(side = 2, text = "Residuals", font = 3, line = 3)mtext(side = 1, text = "Time", font = 3, line = 3)mtext(side = 3, text = paste0("Diet = ",itr), font = 3)with(ChickWeight, points(Time[which(ChickWeight$Diet==itr)],

residuals(model6)[which(ChickWeight$Diet==itr)],col = addTrans(c("seagreen","red","blue","purple"),120)[itr],cex = 0.8, pch = 19))

with(ChickWeight, points(Time[which(ChickWeight$Diet==itr)],residuals(model6)[which(ChickWeight$Diet==itr)],col = c("seagreen","red","blue","purple")[itr], pch = 1,cex = 0.8))

}

boxplot(residuals(model6) ~ ChickWeight$Diet,col = addTrans(c("seagreen","red","blue","purple"),120),border = c("seagreen","red","blue","purple"),xlab = "Diet",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

24

Page 25: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

Problem 3 [32 points]

library(MASS)

(a) [5 pts.]

pairs(birthwt[,2:10])

(b) [9 pts.]

model8 <- lm(bwt ~ age + lwt + factor(race) + smoke + factor(ptl) + ht+ ui + factor(ftv), data = birthwt)

kable(summary(model8)$coefficients,caption = "Summary of Birth Weight Regression")

(c) [9 pts.]

plot(model8, which = 1, col = NA, pch = 19, axes = FALSE,add.smooth = FALSE, caption = "")

abline(h = seq(-2100,1500,300), col = "gray75", lty = 2)abline(v = seq(1800,3800,200), col = "gray80", lty = 2)abline(0,0, lty = 2, col = "gray45")axis(side = 1, at = seq(1800,3800,200),

as.character(seq(1800,3800,200)), font = 5)axis(side = 2, at = seq(-2100,1500,300),

labels = as.character(seq(-2100,1500,300)), font = 5)points(fitted(model8), residuals(model8),

col = addTrans("orange",120), pch = 19)points(fitted(model8), residuals(model8), col = "orange")panel.smooth(fitted(model8), residuals(model8), col = "orange",

cex = 1, col.smooth = "seagreen", span = 2/3, iter = 3)

plot(birthwt$age, residuals(model8), col = NA, axes = FALSE, xlab= "Age",ylab = "Residuals", font.lab = 3)

axis(side = 1, at = seq(10,50,5), as.character(seq(10,50,5)), font = 5)axis(side = 2, at = seq(-1800,1500,300),

labels = as.character(seq(-1800,1500,300)), font = 5)abline(h = seq(-1800,1500,300), v = seq(10,50,5), col = "gray70", lty = 2)abline(0,0, lty = 2, col = "gray45")points(birthwt$age, residuals(model8),

col = addTrans("orange",120), pch = 19, cex = 0.8)points(birthwt$age, residuals(model8), col = "orange", cex = 0.8)panel.smooth(birthwt$age, residuals(model8), col = NA,cex = 0.5,

col.smooth = "seagreen", span = 2/3, iter = 3)

plot(birthwt$lwt, residuals(model8), col = NA, axes = FALSE,xlab= "Age", ylab = "Residuals", font.lab = 3)

axis(side = 1, at = seq(80,250,20), as.character(seq(80,250,20)), font = 5)axis(side = 2, at = seq(-1800,1500,300),

25

Page 26: 36-401 Modern Regression HW #6 Solutionslarry/=stat401/HW6sol.pdf · 36-401 Modern Regression HW #6 Solutions DUE: 10/27/2017 at 3PM Problem 1 [32 points] (a) (4 pts.) 50 150 300

labels = as.character(seq(-1800,1500,300)), font = 5)abline(h = seq(-1800,1500,300), v = seq(80,250,20), col = "gray70", lty = 2)abline(0,0, lty = 2, col = "gray45")points(birthwt$lwt, residuals(model8), col = addTrans("orange",120),

pch = 19, cex = 0.8)points(birthwt$lwt, residuals(model8), col = "orange", cex = 0.8)panel.smooth(birthwt$lwt, residuals(model8), col = NA,cex = 0.5,

col.smooth = "seagreen", span = 2/3, iter = 3)

boxplot(residuals(model8) ~ birthwt$race, col = addTrans(c("seagreen","red","blue"),120),border = c("seagreen","red","blue"), xlab = "Race",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

boxplot(residuals(model8) ~ birthwt$smoke, col = addTrans(c("seagreen","orange"),120),border = c("seagreen","orange"), xlab = "Smoke",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

boxplot(residuals(model8) ~ birthwt$ptl,col = addTrans(c("seagreen","orange","red","blue"),120),border = c("seagreen","orange","red","blue"),xlab = "Number of previous premature labours",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

boxplot(residuals(model8) ~ birthwt$ht, col = addTrans(c("seagreen","orange"),120),border = c("seagreen","orange"), xlab = "History of hypertension",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

boxplot(residuals(model8) ~ birthwt$ui, col = addTrans(c("seagreen","orange"),120),border = c("seagreen","orange"), xlab = "Presence of uterine irritability",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

boxplot(residuals(model8) ~ birthwt$ftv,col = addTrans(c("seagreen","orange","red","blue","purple","yellow2","pink"),120),border = c("seagreen","orange","red","blue","purple","yellow2","pink"),xlab = "Number of physician visits during the first trimester.",ylab = "Residuals", pch = 19)

abline(0,0, lty = 2, col = "gray45")

26