Lecture 6: Multiple Linear Regression Adjusted Variable Plots
Lecture 6: Multiple Linear Regression Adjusted Variable Plots
description
Transcript of Lecture 6: Multiple Linear Regression Adjusted Variable Plots
![Page 1: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/1.jpg)
Lecture 6:Multiple Linear RegressionAdjusted Variable Plots
BMTRY 701Biostatistical Methods II
![Page 2: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/2.jpg)
Graphical Displays in MLR
No more one simple scatterplot: need to look at multiple pairs of variables
“pairs” in R. but, we can’t look at all covariates in regards to
the way they enter the model solution: adjusted variable plot aka: partial regression plot
![Page 3: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/3.jpg)
Adjusted Variable Plots
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model.
With two covariates: Shows the association between X and Y adjusted for another variable, Z.
With more than two covariates: Shows the association between X and Y adjusted for many other covariates
In our example, association between logLOS and number of nurses, adjusted for number of beds
![Page 4: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/4.jpg)
Approach
Assume we want to look at the association of Y and X, adjusted for Z
Step 1: Regress Y on X and save residuals (res.xy)
Step 2: Regress Z on X and save residuals (res.xz)
Step 3: plot res.xy versus res.xz Optional step 4:
• perform regression of res.xy on res.xz• compare slope to that of MLR of Y on X and Z
MPV: section 4.2.4
![Page 5: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/5.jpg)
SENIC
![Page 6: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/6.jpg)
INFRISK
0 200 400 600 800
23
45
67
8
020
040
060
080
0
BEDS
2 3 4 5 6 7 8 2.0 2.2 2.4 2.6 2.8 3.0
2.0
2.2
2.4
2.6
2.8
3.0
logLOS
![Page 7: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/7.jpg)
R
pairs(~INFRISK+BEDS+logLOS, data=data, pch=16)
# adjusted variable plot approach# look at the association between INFRISK and logLOS, # adjusting for BEDS
reg.xy <- lm(logLOS ~ BEDS, data=data)res.xy <- reg.xy$residuals
reg.xz <- lm(INFRISK ~ BEDS, data=data)res.xz <- reg.xz$residuals
plot(res.xz, res.xy, pch=16)reg.res <- lm(res.xy ~ res.xz)abline(reg.res, lwd=2)reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data)
![Page 8: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/8.jpg)
-2 -1 0 1 2 3
-0.2
0.0
0.2
0.4
0.6
res.xz
res.
xy
![Page 9: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/9.jpg)
Why is this important or interesting?
It shows us the ‘adjusted’ relationship it can help us determine if
• it is an important variable (at all)• if another form of X is more appropriate• if the correlation is high vs. low after adjustment• we need to/want to adjust for this variable
It also informs us about why a variable ‘loses’ significance
Most important: check for non-linearity Example: logLOS ~ NURSE
![Page 10: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/10.jpg)
What about BEDS and NURSE?
# why NURSE is not associated, after adjustment for BEDS?
reg.nurse <- lm(logLOS ~ NURSE, data=data)reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data)
reg.xy <- lm(logLOS ~ BEDS, data=data)res.xy <- reg.xy$residuals
reg.xz <- lm(NURSE ~ BEDS, data=data)res.xz <- reg.xz$residuals
plot(res.xz, res.xy, pch=16)reg.res <- lm(res.xy ~ res.xz)abline(reg.res, lwd=2)
![Page 11: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/11.jpg)
-200 -100 0 100 200
-0.2
0.0
0.2
0.4
0.6
res.xz
res.
xy
![Page 12: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/12.jpg)
What about the other way around?
######################## what about the other way? what about why BEDS is # assoc after adjustment for NURSE?
reg.xy <- lm(logLOS ~ NURSE, data=data)res.xy <- reg.xy$residuals
reg.xz <- lm(BEDS ~ NURSE, data=data)res.xz <- reg.xz$residuals
plot(res.xz, res.xy, pch=16)reg.res <- lm(res.xy ~ res.xz)abline(reg.res, lwd=2)reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data)
![Page 13: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/13.jpg)
-200 -100 0 100 200 300
-0.2
0.0
0.2
0.4
0.6
res.xz
res.
xy
![Page 14: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/14.jpg)
Interpretation in MLR
“Adjusted for” “Controlled for “ “Holding all else constant”
In MLR, you need to include one of these phrases (or something like one of them) when interpreting a regression coefficient
![Page 15: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/15.jpg)
LOS ~ INFRISK + BEDS
> reg.infrisk.beds <- lm(LOS ~ BEDS + INFRISK, data=data)> summary(reg.infrisk.beds)
Call:lm(formula = LOS ~ BEDS + INFRISK, data = data)
Residuals: Min 1Q Median 3Q Max -2.8624 -0.9904 -0.1996 0.6671 8.4219
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2703521 0.5038751 12.444 < 2e-16 ***BEDS 0.0024747 0.0008236 3.005 0.00329 ** INFRISK 0.6323812 0.1184476 5.339 5.08e-07 ***---
![Page 16: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/16.jpg)
Hard to interpret with so many decimal places!
> data$beds100 <- data$BEDS/100> reg.infrisk.beds <- lm(LOS ~ beds100 + INFRISK, data=data)> summary(reg.infrisk.beds)
Call:lm(formula = LOS ~ beds100 + INFRISK, data = data)
Residuals: Min 1Q Median 3Q Max -2.8624 -0.9904 -0.1996 0.6671 8.4219
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.27035 0.50388 12.444 < 2e-16 ***beds100 0.24747 0.08236 3.005 0.00329 ** INFRISK 0.63238 0.11845 5.339 5.08e-07 ***---
![Page 17: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/17.jpg)
logLOS ~ INFRISK + BEDS
> reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data)> summary(reg.infrisk.beds)
Call:lm(formula = logLOS ~ BEDS + INFRISK, data = data)
Residuals: Min 1Q Median 3Q Max -0.314377 -0.079979 -0.008026 0.072108 0.580675
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926e+00 4.611e-02 41.767 < 2e-16 ***BEDS 2.407e-04 7.538e-05 3.194 0.00183 ** INFRISK 6.048e-02 1.084e-02 5.579 1.75e-07 ***---
![Page 18: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/18.jpg)
Hard to interpret with so many decimal places!
> data$beds100 <- data$BEDS/100> reg.infrisk.beds100 <- lm(logLOS ~ beds100 + INFRISK, data=data)> summary(reg.infrisk.beds100)
Call:lm(formula = logLOS ~ beds100 + INFRISK, data = data)
Residuals: Min 1Q Median 3Q Max -0.314377 -0.079979 -0.008026 0.072108 0.580675
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926040 0.046114 41.767 < 2e-16 ***beds100 0.024075 0.007538 3.194 0.00183 ** INFRISK 0.060477 0.010840 5.579 1.75e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1435 on 110 degrees of freedomMultiple R-squared: 0.3612, Adjusted R-squared: 0.3496 F-statistic: 31.1 on 2 and 110 DF, p-value: 1.971e-11
![Page 19: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/19.jpg)
How to interpret?
Pick two values of BEDS• e.g. 100 to 200• e.g. 400 to 500
Estimate the difference in logLOS for each value
What do we plug in for INFRISK?
INFRISKbeds
INFRISKbedsSLO
*060.0100*024.093.1
ˆ100ˆˆˆlog 210
![Page 20: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/20.jpg)
How to interpret? Remember that our inferences are “holding all else
constant” To compare two hospitals with the same INFRISK, it
doesn’t matter what you put in (as long as it is the same)
024.0
1*024.02*024.0
)*060.01*024.093.1(
)*060.02*024.093.1(]1100|ˆ[log]2100|ˆ[log
*060.02*024.093.1
ˆ100ˆˆ]2100|ˆ[log
*060.01*024.093.1
ˆ100ˆˆ]1100|ˆ[log
210
210
INFRISK
INFRISKbedsSLObedsSLO
INFRISK
INFRISKbedsbedsSLO
INFRISK
INFRISKbedsbedsSLO
![Page 21: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/21.jpg)
How to interpret?
02.1)024.0exp(
)1(ˆ)2(ˆ
)1(ˆ)2(ˆ
logexp))1(ˆlog)2(ˆexp(log
SLO
SLO
SLO
SLOSLOSLO
Comparing two hospitals whose number of beds differ by 100 andassuming the same infection risk in the two hospitals is the same, theratio of average LOS in the two hospitals is 1.02 with the hospital with more beds having the longer stay.
![Page 22: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/22.jpg)
difference of 400 beds?
10.1)024.0*4exp(
)1(ˆ)5(ˆ
)1(ˆ)5(ˆ
logexp))1(ˆlog)5(ˆexp(log
SLO
SLO
SLO
SLOSLOSLO
![Page 23: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/23.jpg)
When outcome is log transformed
interpretation of coefficients must be made as RATIOS instead of DIFFERENCES
Need to exponentiate the coefficient. its interpretation is the ratio for a one-unit
difference in the predictor.
![Page 24: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/24.jpg)
Why differences do not work
Consider comparing two hospitals with 400 and 300 beds:
)*06.093.1exp(*0263.0
)075.1101.1)(*06.093.1exp(
))3*024.0exp()4*024.0)(exp(*06.093.1exp(
)*06.0exp(*)3*024.0exp(*)93.1exp(
)*06.0exp(*)4*024.0exp(*)93.1exp(]3100|ˆ[]4100|ˆ[
)*06.0exp(*)3*024.0exp(*)93.1exp(
)*06.03*024.093.1exp(]3100|ˆ[
*060.03*024.093.1
ˆ100ˆˆ]3100|ˆ[log
)*06.0exp(*)4*024.0exp(*)93.1exp(
)*06.04*024.093.1exp(]4100|ˆ[
*060.04*024.093.1
ˆ100ˆˆ]4100|ˆ[log
210
210
INFRISK
INFRISK
INFRISK
INFRISK
INFRISKbedsSLObedsSLO
INFRISK
INFRISKbedsSOL
INFRISK
INFRISKbedsbedsSLO
INFRISK
INFRISKbedsSOL
INFRISK
INFRISKbedsbedsSLO
![Page 25: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/25.jpg)
Why differences do not work
Consider comparing two hospitals with 800 and 700 beds:
)*06.093.1exp(*0291.0
)183.1212.1)(*06.093.1exp(
))7*024.0exp()8*024.0)(exp(*06.093.1exp(
)*06.0exp(*)7*024.0exp(*)93.1exp(
)*06.0exp(*)8*024.0exp(*)93.1exp(]7100|ˆ[]8100|ˆ[
)*06.0exp(*)7*024.0exp(*)93.1exp(
)*06.07*024.093.1exp(]7100|ˆ[
*060.07*024.093.1
ˆ100ˆˆ]7100|ˆ[log
)*06.0exp(*)8*024.0exp(*)93.1exp(
)*06.08*024.093.1exp(]8100|ˆ[
*060.08*024.093.1
ˆ100ˆˆ]8100|ˆ[log
210
210
INFRISK
INFRISK
INFRISK
INFRISK
INFRISKbedsSLObedsSLO
INFRISK
INFRISKbedsSOL
INFRISK
INFRISKbedsbedsSLO
INFRISK
INFRISKbedsSOL
INFRISK
INFRISKbedsbedsSLO
![Page 26: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/26.jpg)
Results in the log scale
0 2 4 6 8
2.0
2.2
2.4
2.6
2.8
3.0
data$beds100
da
ta$
log
LO
S
INFRISK=2INFRISK=5
![Page 27: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/27.jpg)
Results on the “linear” scale: not huge differences
0 2 4 6 8
81
01
21
41
61
82
0
data$beds100
da
ta$
LO
S
INFRISK=2INFRISK=5
![Page 28: Lecture 6: Multiple Linear Regression Adjusted Variable Plots](https://reader033.fdocuments.in/reader033/viewer/2022051419/56815a95550346895dc8124d/html5/thumbnails/28.jpg)
Differences can be seen on a larger scale plot
0 10 20 30 40 50
10
15
20
25
30
data$beds100
da
ta$
LO
S
INFRISK=2INFRISK=5