Computer exercise 3: solutionsAnna Lindgren15 April 2019
Exercise 3: U.S. county demographic information
cdi <- read.delim("../data/CDI.txt")cdi$region <- factor(cdi$region, levels = c(1, 2, 3, 4),
labels = c("Northeast", "Midwest", "South", "West"))cdi$phys1000 <- 1000 * cdi$phys / cdi$populcdi$crm1000 <- 1000 * cdi$crimes / cdi$popul
(a)
As seen in Figure 1 the number of physicians per 1000 inhabitants is very skewed, with large variabilityup, but not down. The logarithm is much more symmetric.
(b)
Fit all eight models, including the empty one with only the intercept.### See the R-code for Lecture 5 where I did this:mod.1 <- lm(log(phys1000) ~ 1, data = cdi)mod.2 <- lm(log(phys1000) ~ percapitaincome, data = cdi)mod.3 <- lm(log(phys1000) ~ crm1000, data = cdi)mod.4 <- lm(log(phys1000) ~ pop65plus, data = cdi)mod.5 <- lm(log(phys1000) ~ percapitaincome + crm1000, data = cdi)mod.6 <- lm(log(phys1000) ~ percapitaincome + pop65plus, data = cdi)mod.7 <- lm(log(phys1000) ~ crm1000 + pop65plus, data = cdi)mod.8 <- lm(log(phys1000) ~ percapitaincome + crm1000 + pop65plus, data = cdi)
sum.1 <- summary(mod.1)sum.2 <- summary(mod.2)sum.3 <- summary(mod.3)sum.4 <- summary(mod.4)sum.5 <- summary(mod.5)sum.6 <- summary(mod.6)sum.7 <- summary(mod.7)sum.8 <- summary(mod.8)
sum.8#>#> Call:#> lm(formula = log(phys1000) ~ percapitaincome + crm1000 + pop65plus,#> data = cdi)#>#> Residuals:#> Min 1Q Median 3Q Max#> -1.68142 -0.28720 -0.02991 0.28371 2.29373#>#> Coefficients:#> Estimate Std. Error t value Pr(>|t|)#> (Intercept) -1.262e+00 1.308e-01 -9.648 < 2e-16 ***
1
#> percapitaincome 6.484e-05 5.258e-06 12.333 < 2e-16 ***#> crm1000 8.197e-03 7.826e-04 10.474 < 2e-16 ***#> pop65plus 1.417e-02 5.340e-03 2.654 0.00825 **#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#>#> Residual standard error: 0.4457 on 436 degrees of freedom#> Multiple R-squared: 0.3623, Adjusted R-squared: 0.3579#> F-statistic: 82.55 on 3 and 436 DF, p-value: < 2.2e-16
Note that all three covariates are significant, which is good.
Collect all their R2adj and BIC values
crits <- data.frame(nr = seq(1, 8),name = c("intercept", "percapinc", "crm1000", "pop65",
"inc_crm", "inc_65", "crm_65", "inc_crm_65"),p = c(0, 1, 1, 1, 2, 2, 2, 3),r2adj = c(sum.1$adj.r.squared, sum.2$adj.r.squared,
sum.3$adj.r.squared, sum.4$adj.r.squared,sum.5$adj.r.squared, sum.6$adj.r.squared,sum.7$adj.r.squared, sum.8$adj.r.squared),
bic = AIC(mod.1, mod.2, mod.3, mod.4, mod.5, mod.6, mod.7,mod.8, k = log(nrow(cdi)))[, 2])
crits#> nr name p r2adj bic#> 1 1 intercept 0 0.000000000 743.6230#> 2 2 percapinc 1 0.194245926 653.6766#> 3 3 crm1000 1 0.126177615 689.3600#> 4 4 pop65 1 0.004744857 746.6137#> 5 5 inc_crm 2 0.348991910 564.9248#> 6 6 inc_65 2 0.198131208 656.6309#> 7 7 crm_65 2 0.135851083 689.5430#> 8 8 inc_crm_65 3 0.357872189 563.9603
The model with all three explanatory variables is the best since it has the highest R2adj and the lowest
BIC.
(c)
v <- influence(mod.8)$hatlimit.v <- 2 * (3 + 1) / nrow(cdi)cdi[v > 0.16, ]#> id county state area popul pop1834 pop65plus phys beds crimes higrads#> 6 6 Kings NY 71 2300664 28.3 12.4 4861 8942 680966 63.7#> bachelors poors unemployed percapitaincome totalincome region#> 6 16.6 19.5 9.5 16803 38658 Northeast#> phys1000 crm1000#> 6 2.112868 295.9867I.Kings <- 6
As seen in Figure 2, several counties have a high leverage, in particular Kings county, New York. This isdue to the unusually high crime rate, as seen in Figure 2(c).
2
(d)
r <- rstudent(mod.8)pred.8 <- predict(mod.8)cdi[r == max(r), ]#> id county state area popul pop1834 pop65plus phys beds crimes#> 418 418 Olmsted MN 653 106470 29.3 10 1814 1437 4310#> higrads bachelors poors unemployed percapitaincome totalincome region#> 418 88 29.5 4.5 3.3 20515 2184 Midwest#> phys1000 crm1000#> 418 17.03766 40.48089I.Olmsted <- 418
As seen in Figure 3(a)-(d) most of the residuals lie within ±2. They seem to have constant variance andno non-linear trends. However, there are two counties with large residuals. Kings county, NY has a large,negative residual (fewer physicians than expected) while Olmsted, Montana has a large positive residual(more physicians than expected).
(e)
s.i <- influence(mod.8)$sigma
As seen in Figure 4, Olmsted, MN, which had a large residual, produced the largest decrease in s(i) whenleft out, with Kings county, NY as a close second.
(f)
limit.cook <- c(1, 4 / nrow(cdi))D <- cooks.distance(mod.8)
As seen in Figure~??, Kings county has had a large influence on the β-estimates. Olmsted has not hadan alarming influence.
(g)
dfb <- dfbetas(mod.8)summary(dfb)#> (Intercept) percapitaincome crm1000#> Min. :-0.2327468 Min. :-1.841e-01 Min. :-1.9517594#> 1st Qu.:-0.0169952 1st Qu.:-1.502e-02 1st Qu.:-0.0130113#> Median :-0.0013843 Median : 2.287e-04 Median : 0.0005067#> Mean : 0.0004313 Mean :-4.916e-05 Mean :-0.0008209#> 3rd Qu.: 0.0098904 3rd Qu.: 1.441e-02 3rd Qu.: 0.0165190#> Max. : 0.7465022 Max. : 1.862e-01 Max. : 0.3863246#> pop65plus#> Min. :-0.3306552#> 1st Qu.:-0.0099315#> Median : 0.0011489#> Mean :-0.0002691#> 3rd Qu.: 0.0158057#> Max. : 0.2165531limit.dfb <- c(-1, -2 / sqrt(nrow(cdi)), 0, 2 / sqrt(nrow(cdi)), 1)
As seen in Figure 6, Kings county as had a huge influence on the β-estimate for crm1000 (c), as might beexpected, and a quite large influence on the intercept (a). Olmsted has not had a huge influence on any
3
of the parameters.
(h)
When Kings county is taken out, the largest model is still the best, no county has a huge leverage or alarge influence on the parameter estimates. However, Olmsted still has a large residual.
Plots
# Figure 1.figcaption = "Physicians or log physicians. Kings County, NY in red, Olmsted, MN in blue."par(mfrow = c(3, 2))
with(cdi, plot(phys1000 ~ percapitaincome,main = "(a) Physicians vs per capita income"))
with(cdi[I.Kings, ], points(percapitaincome, phys1000, col = "red", pch = 19))with(cdi[I.Olmsted, ], points(percapitaincome, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ percapitaincome, log = "y",main = "(b) log Physicians vs per capita income"))
with(cdi[I.Kings, ], points(percapitaincome, phys1000, col = "red", pch = 19))with(cdi[I.Olmsted, ], points(percapitaincome, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ crm1000,main = "(c) Physicians vs crime"))
with(cdi[I.Kings, ], points(crm1000, phys1000, col = "red", pch = 19))with(cdi[I.Olmsted, ], points(crm1000, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ crm1000, log = "y",main = "(d) log Physicians vs crime"))
with(cdi[I.Kings, ], points(crm1000, phys1000, col = "red", pch = 19))with(cdi[I.Olmsted, ], points(crm1000, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ pop65plus,main = "(e) Physicians vs 65+ population"))
with(cdi[I.Kings, ], points(pop65plus, phys1000, col = "red", pch = 19))with(cdi[I.Olmsted, ], points(pop65plus, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ pop65plus, log = "y",main = "(f) log Physicians vs 65+ population"))
with(cdi[I.Kings, ], points(pop65plus, phys1000, col = "red", pch = 19))with(cdi[I.Olmsted, ], points(pop65plus, phys1000, col = "blue", pch = 8))
# Figure 2.figcaption <- "Leverage of the full model. Kings County, NY in red, Olmsted, MN in blue."par(mfrow = c(2, 2))
with(cdi, plot(id, v, main = "(a) Leverage against id"))points(I.Kings, v[I.Kings], col = "red", pch = 19)points(I.Olmsted, v[I.Olmsted], col = "blue", pch = 8)abline(h = limit.v)
with(cdi, plot(v ~ percapitaincome,main = "(b) Leverage against per capita income"))
with(cdi, points(percapitaincome[I.Kings], v[I.Kings], col = "red", pch = 19))with(cdi, points(percapitaincome[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8))
4
10000 20000 30000
05
1015
(a) Physicians vs per capita income
percapitaincome
phys
1000
10000 20000 30000
0.5
2.0
5.0
(b) log Physicians vs per capita income
percapitaincomeph
ys10
00
0 50 100 150 200 250 300
05
1015
(c) Physicians vs crime
crm1000
phys
1000
0 50 100 150 200 250 300
0.5
2.0
5.0
(d) log Physicians vs crime
crm1000
phys
1000
5 10 15 20 25 30 35
05
1015
(e) Physicians vs 65+ population
pop65plus
phys
1000
5 10 15 20 25 30 35
0.5
2.0
5.0
(f) log Physicians vs 65+ population
pop65plus
phys
1000
Figure 1: Physicians or log physicians. Kings County, NY in red, Olmsted, MN in blue.
5
abline(sh = limit.v)#> Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "sh"#> is not a graphical parameter
with(cdi, plot(v ~ crm1000, main = "(c) Leverage against crime per 1000"))with(cdi, points(crm1000[I.Kings], v[I.Kings], col = "red", pch = 19))with(cdi, points(crm1000[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8))abline(h = limit.v)
with(cdi, plot(v ~ pop65plus, main = "(d) Leverage against population 65+"))with(cdi, points(pop65plus[I.Kings], v[I.Kings], col = "red", pch = 19))with(cdi, points(pop65plus[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8))abline(h = limit.v)
# Figure 3.figcaption = "Studentized residuals. Kings County, NY in red, Olmsted, MN in blue"par(mfrow = c(2, 2))
plot(r ~ pred.8, main = "(a) Studentized residuals against predicted values")points(pred.8[I.Kings], r[I.Kings], col = "red", pch = 19)points(pred.8[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8)abline(h = c(-2, 0, 2), lty = 2)
with(cdi,plot(r ~ percapitaincome,
main = "(b) Studentized residuals against per capita income"))with(cdi,
points(percapitaincome[I.Kings], r[I.Kings], col = "red", pch = 19))with(cdi,
points(percapitaincome[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))abline(h = c(-2, 0, 2), lty = 2)
with(cdi,plot(r ~ crm1000,
main = "(c) Studentized residuals against crime per 1000"))with(cdi,
points(crm1000[I.Kings], r[I.Kings], col = "red", pch = 19))with(cdi,
points(crm1000[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))abline(h = c(-2, 0, 2), lty = 2)
with(cdi,plot(r ~ pop65plus,
main = "(d) Studentized residuals against poulation 65+"))with(cdi,
points(pop65plus[I.Kings], r[I.Kings], col = "red", pch = 19))with(cdi,
points(pop65plus[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))abline(h = c(-2, 0, 2), lty = 2)
# Figure 4.figcaption <- "Observations' effect on the sigma estimate. Kings county, NY in red, Olmsted, MN in blue"
with(cdi, plot(s.i ~ id, main = "Leave-one-out sigma-estimates"))with(cdi, points(id[I.Kings], s.i[I.Kings], col = "red", pch = 19))with(cdi, points(id[I.Olmsted], s.i[I.Olmsted], col = "blue", pch = 8))
# Figure 5figcaption = "Cood's distance. Kings county, NY in red, Olmsted, MN in blue"
6
0 100 200 300 400
0.00
0.05
0.10
0.15
(a) Leverage against id
id
v
10000 20000 30000
0.00
0.05
0.10
0.15
(b) Leverage against per capita income
percapitaincome
v
0 50 100 150 200 250 300
0.00
0.05
0.10
0.15
(c) Leverage against crime per 1000
crm1000
v
5 10 15 20 25 30 35
0.00
0.05
0.10
0.15
(d) Leverage against population 65+
pop65plus
v
Figure 2: Leverage of the full model. Kings County, NY in red, Olmsted, MN in blue.
7
0.0 0.5 1.0 1.5 2.0 2.5
−4
−2
02
4
(a) Studentized residuals against predicted values
pred.8
r
10000 20000 30000
−4
−2
02
4
(b) Studentized residuals against per capita income
percapitaincome
r
0 50 100 150 200 250 300
−4
−2
02
4
(c) Studentized residuals against crime per 1000
crm1000
r
5 10 15 20 25 30 35
−4
−2
02
4
(d) Studentized residuals against poulation 65+
pop65plus
r
Figure 3: Studentized residuals. Kings County, NY in red, Olmsted, MN in blue
8
0 100 200 300 400
0.43
20.
438
0.44
4Leave−one−out sigma−estimates
id
s.i
Figure 4: Observations’ effect on the sigma estimate. Kings county, NY in red, Olmsted, MN in blue
par(mfrow = c(2, 2))
with(cdi, plot(D ~ percapitaincome, ylim = c(0, 1),main = "(a) Cook's distans against per capita income"))
with(cdi, points(percapitaincome[I.Kings], D[I.Kings], col = "red", pch = 19))with(cdi, points(percapitaincome[I.Olmsted], D[I.Olmsted], col = "blue", pch = 8))abline(h = limit.cook)
with(cdi, plot(D ~ crm1000, ylim = c(0, 1),main = "(b) Cook's distancs against crime per 1000"))
with(cdi, points(crm1000[I.Kings], D[I.Kings], col = "red", pch = 19))with(cdi, points(crm1000[I.Olmsted], D[I.Olmsted], col = "blue", pch = 8))abline(h = limit.cook)
with(cdi, plot(D ~ pop65plus, ylim = c(0, 1),main = "(c) Cook's distance against population 65+"))
with(cdi, points(pop65plus[I.Kings], D[I.Kings], col = "red", pch = 19))with(cdi, points(pop65plus[I.Olmsted], D[I.Olmsted], col = "blue", pch = 8))abline(h = limit.cook)
# Figure 6figcaption = "DFbetas. Kings county, NY in red, Olmsted, MN in blue"par(mfrow = c(2, 2))
with(cdi, plot(dfb[, 1] ~ id, main = "(a) bfbeta for the intercept",ylim = c(-1, 1), ylab = "dfbeta_0"))
with(cdi, points(id[I.Kings], dfb[I.Kings, 1], pch = 19, col = "red"))with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 1], pch = 8, col = "blue"))abline(h = limit.dfb)
with(cdi, plot(dfb[, 2] ~ id, main = "(b) bfbeta for per capita income",ylim = c(-1, 1), ylab = "dfbeta_percapitaincome"))
with(cdi, points(id[I.Kings], dfb[I.Kings, 2], pch = 19, col = "red"))with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 2], pch = 8, col = "blue"))
9
10000 20000 30000
0.0
0.6
(a) Cook's distans against per capita income
percapitaincome
D
0 50 100 150 200 250 300
0.0
0.6
(b) Cook's distancs against crime per 1000
crm1000
D
5 10 15 20 25 30 35
0.0
0.6
(c) Cook's distance against population 65+
pop65plus
D
Figure 5: Cood’s distance. Kings county, NY in red, Olmsted, MN in blue
abline(h = limit.dfb)
with(cdi, plot(dfb[, 3] ~ id, main = "(c) bfbeta for crime per 1000",ylim = c(-2, 1), ylab = "dfbeta_crm1000"))
with(cdi, points(id[I.Kings], dfb[I.Kings, 3], pch = 19, col = "red"))with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 3], pch = 8, col = "blue"))abline(h = limit.dfb)
with(cdi, plot(dfb[, 4] ~ id, main = "(d) bfbeta for population 65+",ylim = c(-1, 1), ylab = "dfbeta_pop65plus"))
with(cdi, points(id[I.Kings], dfb[I.Kings, 4], pch = 19, col = "red"))with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 4], pch = 8, col = "blue"))abline(h = limit.dfb)
(h) plots
cdi <- cdi[-I.Kings,]I.Olmsted <- 417
### (a) ###par(mfrow = c(3, 2))with(cdi, plot(phys1000 ~ percapitaincome,
main = "(a) Physicians vs per capita income"))with(cdi[I.Olmsted, ], points(percapitaincome, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ percapitaincome, log = "y",main = "(b) log Physicians vs per capita income"))
10
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(a) bfbeta for the intercept
id
dfbe
ta_0
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(b) bfbeta for per capita income
id
dfbe
ta_p
erca
pita
inco
me
0 100 200 300 400
−2.
0−
1.0
0.0
1.0
(c) bfbeta for crime per 1000
id
dfbe
ta_c
rm10
00
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(d) bfbeta for population 65+
id
dfbe
ta_p
op65
plus
Figure 6: DFbetas. Kings county, NY in red, Olmsted, MN in blue
11
with(cdi[I.Olmsted, ], points(percapitaincome, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ crm1000,main = "(c) Physicians vs crime"))
with(cdi[I.Olmsted, ], points(crm1000, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ crm1000, log = "y",main = "(d) log Physicians vs crime"))
with(cdi[I.Olmsted, ], points(crm1000, phys1000, col = "blue", pch = 8))
with(cdi, plot(phys1000 ~ pop65plus,main = "(e) Physicians vs 65+ population"))
with(cdi[I.Olmsted, ], points(pop65plus, phys1000, col = "blue",pch = 8))
with(cdi, plot(phys1000 ~ pop65plus, log = "y",main = "(f) log Physicians vs 65+ population"))
with(cdi[I.Olmsted, ], points(pop65plus, phys1000, col = "blue", pch = 8))
### (b) ###mod.1 <- lm(log(phys1000) ~ 1, data=cdi)mod.2 <- lm(log(phys1000) ~ percapitaincome, data=cdi)mod.3 <- lm(log(phys1000) ~ crm1000, data=cdi)mod.4 <- lm(log(phys1000) ~ pop65plus, data=cdi)mod.5 <- lm(log(phys1000) ~ percapitaincome+crm1000, data=cdi)mod.6 <- lm(log(phys1000) ~ percapitaincome+pop65plus, data=cdi)mod.7 <- lm(log(phys1000) ~ crm1000+pop65plus, data=cdi)mod.8 <- lm(log(phys1000) ~ percapitaincome+crm1000+pop65plus, data=cdi)sum.1 <- summary(mod.1)sum.2 <- summary(mod.2)sum.3 <- summary(mod.3)sum.4 <- summary(mod.4)sum.5 <- summary(mod.5)sum.6 <- summary(mod.6)sum.7 <- summary(mod.7)sum.8 <- summary(mod.8)sum.8#>#> Call:#> lm(formula = log(phys1000) ~ percapitaincome + crm1000 + pop65plus,#> data = cdi)#>#> Residuals:#> Min 1Q Median 3Q Max#> -1.04707 -0.27211 -0.03999 0.26980 2.31530#>#> Coefficients:#> Estimate Std. Error t value Pr(>|t|)#> (Intercept) -1.358e+00 1.303e-01 -10.420 < 2e-16 ***#> percapitaincome 6.515e-05 5.159e-06 12.627 < 2e-16 ***#> crm1000 9.696e-03 8.453e-04 11.470 < 2e-16 ***#> pop65plus 1.492e-02 5.242e-03 2.846 0.00464 **#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#>#> Residual standard error: 0.4373 on 435 degrees of freedom#> Multiple R-squared: 0.3874, Adjusted R-squared: 0.3832#> F-statistic: 91.71 on 3 and 435 DF, p-value: < 2.2e-16
12
10000 20000 30000
05
1015
(a) Physicians vs per capita income
percapitaincome
phys
1000
10000 20000 30000
0.5
2.0
10.0
(b) log Physicians vs per capita income
percapitaincomeph
ys10
00
0 50 100 150
05
1015
(c) Physicians vs crime
crm1000
phys
1000
0 50 100 150
0.5
2.0
10.0
(d) log Physicians vs crime
crm1000
phys
1000
5 10 15 20 25 30 35
05
1015
(e) Physicians vs 65+ population
pop65plus
phys
1000
5 10 15 20 25 30 35
0.5
2.0
10.0
(f) log Physicians vs 65+ population
pop65plus
phys
1000
Figure 7: Data without Kings, NY
13
crits <- data.frame(nr=c(1:8),name=c("intercept", "percapinc", "crm1000", "pop65",
"inc_crm", "inc_65", "crm_65", "inc_crm_65"),p=c(0,1,1,1,2,2,2,3),r2adj=c(sum.1$adj.r.squared, sum.2$adj.r.squared,
sum.3$adj.r.squared, sum.4$adj.r.squared,sum.5$adj.r.squared, sum.6$adj.r.squared,sum.7$adj.r.squared, sum.8$adj.r.squared),
bic=AIC(mod.1, mod.2, mod.3, mod.4, mod.5, mod.6, mod.7,mod.8, k=log(nrow(cdi)))[,2])
crits#> nr name p r2adj bic#> 1 1 intercept 0 0.000000000 742.8673#> 2 2 percapinc 1 0.194625169 652.9279#> 3 3 crm1000 1 0.148216097 677.5229#> 4 4 pop65 1 0.004734591 745.8649#> 5 5 inc_crm 2 0.373171824 547.9777#> 6 6 inc_65 2 0.198497249 655.8909#> 7 7 crm_65 2 0.159063330 676.9752#> 8 8 inc_crm_65 3 0.383211552 545.9660
### (c) ###par(mfrow = c(2, 2))v <- influence(mod.8)$hatlimit.v <- 2 * (3 + 1) / nrow(cdi)with(cdi, plot(v ~ id, main = "(a) Leverage against id"))points(cdi$id[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8)abline(h = limit.v)
with(cdi, plot(v ~ percapitaincome, main = "(b) Leverage against per capita income"))with(cdi, points(percapitaincome[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8))abline(h = limit.v)
with(cdi, plot(v ~ crm1000, main = "(c) Leverage against crime per 1000"))with(cdi, points(crm1000[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8))abline(h = limit.v)
with(cdi, plot(v ~ pop65plus, main = "(d) Leverage against population 65+"))with(cdi, points(pop65plus[I.Olmsted], v[I.Olmsted], col = "blue", pch = 8))abline(h = limit.v)
### (d) ###par(mfrow = c(2, 2))r <- rstudent(mod.8)pred.8 <- predict(mod.8)
plot(r ~ pred.8, main = "(a) Studentized residuals against predicted values")with(cdi, points(pred.8[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))abline(h = c(-2, 0, 2), lty = 2)
with(cdi, plot(r ~ percapitaincome,main = "(b) Studentized residuals against per capita income"))
with(cdi, points(percapitaincome[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))abline(h = c(-2, 0, 2), lty = 2)
with(cdi, plot(r ~ crm1000,main = "(c) Studentized residuals against crime per 1000"))
with(cdi, points(crm1000[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))
14
0 100 200 300 400
0.00
0.02
0.04
0.06
(a) Leverage against id
id
v
10000 20000 30000
0.00
0.02
0.04
0.06
(b) Leverage against per capita income
percapitaincome
v
0 50 100 150
0.00
0.02
0.04
0.06
(c) Leverage against crime per 1000
crm1000
v
5 10 15 20 25 30 35
0.00
0.02
0.04
0.06
(d) Leverage against population 65+
pop65plus
v
Figure 8: Leverage without Kings, NY
15
abline(h = c(-2, 0, 2), lty = 2)
with(cdi, plot(r ~ pop65plus,main = "(d) Studentized residuals against poulation 65+"))
with(cdi, points(pop65plus[I.Olmsted], r[I.Olmsted], col = "blue", pch = 8))abline(h = c(-2, 0, 2), lty = 2)
### (e) ###s.i <- influence(mod.8)$sigma
with(cdi, plot(s.i ~ id, main = "Leave-one-out sigma-estimates"))with(cdi, points(id[I.Olmsted], s.i[I.Olmsted], col = "blue", pch = 8))
### (f) ###limit.cook <- c(1, 4 / nrow(cdi))D <- cooks.distance(mod.8)
with(cdi, plot(D ~ id, ylim = c(0, 1),main = "(a) Cook's distans against per capita income"))
with(cdi, points(id[I.Olmsted], D[I.Olmsted], col = "blue", pch = 8))abline(h = limit.cook)
### (g) ###par(mfrow = c(2, 2))dfb <- dfbetas(mod.8)limit.dfb <- c(-1, -2 / sqrt(nrow(cdi)), 0, 2 / sqrt(nrow(cdi)), 1)
with(cdi, plot(dfb[, 1] ~ id, main = "(a) bfbeta for the intercept",ylim = c(-1, 1), ylab = "dfbeta_0"))
with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 1], pch = 8, col = "blue"))abline(h = limit.dfb)
with(cdi, plot(dfb[, 2] ~ id, main = "(b) bfbeta for per capita income",ylim = c(-1, 1), ylab = "dfbeta_percapitaincome"))
with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 2], pch = 8, col = "blue"))abline(h = limit.dfb)
with(cdi, plot(dfb[, 3] ~ id, main = "(c) bfbeta for crime per 1000",ylim = c(-1, 1), ylab = "dfbeta_crm1000"))
with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 3], pch = 8, col = "blue"))abline(h = limit.dfb)
with(cdi, plot(dfb[, 4] ~ id, main = "(d) bfbeta for population 65+",ylim = c(-1, 1), ylab = "dfbeta_pop65plus"))
with(cdi, points(id[I.Olmsted], dfb[I.Olmsted, 4], pch = 8, col = "blue"))abline(h = limit.dfb)
16
0.0 0.5 1.0 1.5
−2
02
4
(a) Studentized residuals against predicted values
pred.8
r
10000 20000 30000
−2
02
4
(b) Studentized residuals against per capita income
percapitaincome
r
0 50 100 150
−2
02
4
(c) Studentized residuals against crime per 1000
crm1000
r
5 10 15 20 25 30 35
−2
02
4
(d) Studentized residuals against poulation 65+
pop65plus
r
Figure 9: residuals without Kings, NY
17
0 100 200 300 400
0.42
50.
430
0.43
5Leave−one−out sigma−estimates
id
s.i
Figure 10: sigma without Kings, NY
0 100 200 300 400
0.0
0.2
0.4
0.6
0.8
1.0
(a) Cook's distans against per capita income
id
D
Figure 11: Cook’s distance without Kings, NY
18
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(a) bfbeta for the intercept
id
dfbe
ta_0
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(b) bfbeta for per capita income
id
dfbe
ta_p
erca
pita
inco
me
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(c) bfbeta for crime per 1000
id
dfbe
ta_c
rm10
00
0 100 200 300 400
−1.
0−
0.5
0.0
0.5
1.0
(d) bfbeta for population 65+
id
dfbe
ta_p
op65
plus
Figure 12: dfbetas without Kings, NY
19
Top Related