Download - Multiple Regression (sec VIII). Multiple Regression - Overview Multiple Regression in statistics is the science and art of creating an equation that relates.

Multiple Regression(sec VIII)

Multiple Regression - Overview Multiple Regression in statistics is the science and art of

creating an equation that relates an outcome Y, to one or more predictors, X1, X2, X3, .. Xk.

Linear Regression

Y = a + b1X1 + b2 X2 + b3 X3 + ... + bk Xk + e = Ŷ + e

where "e" is the residual error between the observed Y and the prediction (Ŷ).

In linear regression, bi is the average change in Y for a single unit change in Xi.

“Performance” stats: R2, SDe

Logistic Regression

In multiple logistic regression, Y is 0 or 1 with mean P, the “risk”. The logit of P, not P, is assumed a linear function of the Xs

Logit(P) = ln(P/(1-P)) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk

(“Logit” = log of the odds since P/(1-P) is the odds) odds=exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) risk = P = odds/(odds+1)

“Performance” stats: Sensitivity, Specificity, Accuracy, Concordance (C), mean deviance

Poisson Regression

When Y is a positive integer (0,1,2,3…), we model the log of Y so Y can never be negative. This is the multiple Poisson regression model.

ln(mean Y) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk

mean Y = exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) mean Y cannot be negative.

“Performance” stats: R2, mean deviance

Logit function: P vs ln(P/(1-P))P vs logit

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-4 -3 -2 -1 0 1 2 3 4

logit =log odds

P=ri

sk

Logistic regression Predictors of in hospital infection

Characteristic Odds Ratio (95% CI) p value

Incr APACHE score 1.15 (1.11-1.18) <.001

Transfusion (y/n) 4.15 (2.46-6.99) <.001

Increasing age (yr) 1.03 (1.02-1.05) <.001

Malignancy 2.60 (1.62-4.17) <.001

Max Temperature 0.70 (0.58-0.85) <.001

Adm to treat>7 d 1.66 (1.05-2.61) 0.03

Female (y/n) 1.32 (0.90-1.94) 0.16 *APACHE = Acute Physiology & Chronic Health Evaluation Score

Multiple Proportional Hazards Regr (Cox model)For time dependent outcomes (ie time to death), we model the hazard rate,

h , the event rate per unit time (for death, it is the mortality rate). Since h > 0, we model the log of the hazard as a linear function of the Xs so

h can never be zero (similar to Poisson regression) . ln(h) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk

so h = exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) > 0

If h0=exp(a) is the ‘baseline’ hazard, (that is, a=log(h0)) the hazard ratio is HR = h/h0 = exp(b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) no intercept ‘a’.

If S0(t) is the ‘baseline’ survival curve corresponding to the baseline hazard, then the survival curve for a given combination of X1, X2, … Xk is given by

S(t) = S0(t)HR where HR is computed with the equation above.

exp(bi) is the hazard rate ratio for a one unit change in Xi.

“Performance” stats: Harrell’s Concordance (C) (0.5 < C < 1.0)

Cox regr-HR for patient mortality- Busuttil et al 2005

Cox HRs for donor age (Busuttil 2005)

Donor age HR 95% CI p value

1-18 1.00 (ref) -- --

18-32 1.23 0.88-1.72 0.20

32-48 1.40 1.02-1.92 0.03

48-55 1.51 1.02-2.24 0.04

55-60 2.29 1.48-3.55 < 0.001

60+ 1.61 1.10-2.37 0.01

Harrell C= 0.70

Regression coeff interpretationOutcome (Y) Regression interpretation

continuous Linear b is the average change in Y per one unit increase in X, the rate of change

Binary(P=proportion)

Logistic exp(b)=eb=odds ratio (OR) for a one unit increase in X

Low Positive integers

(0,1,2,3..)

Poisson exp(b)= mean ratio (MR) for a one unit increase in X

Hazard rate(h=events/time)

Cox exp(b)=hazard rate ratio (HR) for a one unit increase in X

S(t) = S0(t)HR

Multiple Linear Regression Example

Consider predictors of Y=Bilirubin (mg/dl) in liver transplant candidates. Two predictors are

X1=Prothombin time (PT) in seconds X2=ALT (alanine aminotransferase in U/L).

A multiple regression equation (on the log scale) is

Ŷ = (predicted) log Bilirubin = -3.96 + 3.47 log PT + 0.21 log ALT

Regression output - equation (Equation) Parameter estimates

term estimate SE t ratio p value

Intercept -3.96 0.257 -15.4 < 0.001

log PT 3.47 0.214 16.2 < 0.001

log ALT 0.21 0.055 3.8 0.0002

equation:

Log Bili=-3.96 + 3.47 log PT + 0.21 log ALT

Regression-analysis of variance table

Source df sum squares mean square F

Model 2 37.76 18.88 147.2

Error 363 46.56 0.1283=SDe2 --

Total 365 84.32

F = 18.88/0.1283 ->Screening F test that none of the predictors are related to Y.

R2 = 37.76/84.32 = 0.448 = 44.8%

R2 = model sum squares/total sum squares

Residual error plotResidual Log Bilirubin by Predicted

-1.0

-0.5

0.0

0.5

1.0

1.5

Log

Bili

rubi

n R

esid

ual

-1.0 -0.5 .0 .5 1.0 1.5 2.0

Log Bilirubin Predicted

When the model is valid, this plot should look like a circular cloud if the errors have constant variance.

The example above is a “good” result.

Example of a “good” residual error histogramerrors have a Gaussian distribution about zero

Moments - errors

Mean 0.000

Std Dev (SDe) 0.357

Std Err Mean 0.0187

n 366

-1 -0.5 0 .5 1 1.5

Quantiles - errors

100.0% maximum 1.711

97.5% 0.834

90.0% 0.454

75.0% quartile 0.177

50.0% median -0.025

25.0% quartile -0.213

10.0% -0.355

2.5% -0.630

0.0% minimum -1.191

Normal Quantile Plot

(Should be approximately a straight line if the residual error data are Gaussian)

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

-1 -0.5 0 .5 1 1.5

residual error (e)

Interpreting multiple regression coefficients (cont.)The multiple regression coefficients will not in general be the same as the

individual regression coefficient for each variable one at a time, even though the same Y is being modeled.

Simple regression Multiple (simultaneous) variable (one Y, one X) (b1X1+b2X2)Log PT 3.560 3.470Log ALT 0.310 0.211 Log Bilirubin = - 3.70 + 3.56 log PT, R2 = 0.425

Log Bilirubin = -.105 + 0.310 log ALT, R2 = 0.049

Log Bilirubin = -3.96 + 3.47 log PT + 0.211 log ALT , R2 = 0.448

Log PT coefficients don’t match

Orthogonally (vs collinearity)In general, regression coefficients from

simple and multiple regression are not the same ↔ controlling for covariates does not give the same answer as ignoring covariates.

Only when all the X variables have correlation zero with each other will the simple and multiple regression coefficients be the same orthogonally

(Collinearity is when Xs are strongly correlated. It is the “opposite” of orthogonality).

Log PT (X1) vs log ALT (X2)since correlation is low, unadjusted and adjusted

regression results are similar

1.00

1.10

1.20

1.30

1.40

1.50

log

Pro

thro

m

1.0 1.5 2.0 2.5 3.0 3.5

log ALT

r12 = 0.111, R2 = 0.0123

Interaction Effects & subgroupsThe model Y = 0 + 1X1 + 2X2 + implies that change in Y due to X1 (=1) is the same

(constant) for all values of X2. An ADDITIVE model.

In the model Y = 0 + 1X1 + 2X2 + 3 X1X2 + the 3 term is an interaction term.

Change in Y for a unit change in X1 is (1+3X2) and is therefore not constant.

Positive 3 is often termed a “synergism”

Negative 3 is often termed an “antagonism”

Additive only if β3=0

How to implement? Make new variable W = X1X2.

Interaction example Response: Y= log HOMA IR (MESA study)

R2=0.280, Root Mean Square Error=SDe=0.623 Mean Response= 0.395, n= 6782

Parameter Estimates Term Estimate Std Error t Ratio p valueIntercept -1.388 0.049 -28.16 <.0001Gender -0.669 0.085 -7.83 <.0001BMI 0.061 0.00168 36.45 <.0001gender*BMI 0.028 0.00299 9.38 <.0001

Predicted log HOMA IR = -1.39 – 0.669 gender + 0.061 BMI + 0.028 gender * BMI (gender is coded 0 for female and 1 for male)

gender x BMI interaction- non additivity

In Females

Log HOMA IR = -1.39 +0.061 BMI

In Males

Log HOMA IR = -2.06 + 0.089 BMI

log e HOMA IR

-0.5

0.0

0.5

1.0

1.5

2.0

20 22 24 26 28 30 32 34 36 38 40 42 44

BMI

log

HO

MA

IR

F

M

Gender x BMI interactionrelation is different in males vs females

HOMA IR

0.0

1.0

2.0

3.0

4.0

5.0

6.0

20 22 24 26 28 30 32 34 36 38 40 42 44

BMI

HO

MA

IR

F

M

Hierarchically well formulated (WWF) regression models

HWF Rule – To correctly evaluate the X1*X2 interaction, must also have X1 and X2 in the model.

In general, one must include the lower order terms in order to correctly evaluate the higher order terms.

HWF example- NON HWF Model:chol = a0 + a1 smoke x age = SMOKEAGE

0, 1 (dummy) coding: smoke=0 or 1, smokeage = smoke x age

Variable DF Estimate std error t p value INTERCEPT 1 156.863 3.993 39.286 0.0001 SMOKEAGE 1 0.361 0.182 1.988 0.0567

----------------------------------------------------------------------------

-1, 1 (effect) coding: smoke=-1 or 1, smokeage =smoke x age Variable DF Estimate std error t p value INTERCEPT 1 162.278 3.103 52.320 0.0001 SMOKEAGE 1 0.055 0.100 0.548 0.5881

p value changes just because coding changes!

HWF example (cont)- HWF HWF:Model: chol = b0 + b1 smoke + b2 age + b3 smoke x age For HWF models, significance is the same regardless of coding

0, 1 (dummy) coding: smoke=0 or 1, smokeage = smoke x age

Variable DF Estimate std error t p value INTERCEPT 1 100.221 1.110 90.304 0.0001 SMOKE 1 3.812 1.570 2.429 0.0224 AGE 1 2.010 0.036 56.297 0.0001 SMOKEAGE 1 -0.009 0.050 -0.178 0.8599

-1, 1 (effect) coding:smoke=-1 or 1, smokeage=smoke x age

Variable DF Estimate std error t p value INTERCEPT 1 102.127 0.785 130.138 0.0001 SMOKE 1 1.906 0.785 2.429 0.0224 AGE 1 2.005 0.025 79.437 0.0001 SMOKEAGE 1 -0.004 0.025 -0.178 0.8599

Regression assumptionsRegression can simultaneously evaluate all

factors and thus reduce confounding. But must check two critical assumptions.

1. If X is continuous/interval, check if relation of X & Y is linear on some scale.

(otherwise polychotomize X)

2. Check if effects of X1, X2, X3 … are additive by adding interaction terms.

(Ex: X4 = X1 x X2). Not additive if interactions are significant.

3 Also, in linear regression, prefer residual errors (e) to have a Gaussian distribution with a constant variance that is independent of Y. But additivity and linearity are more important since lack of additivity and linearity lead to bias and is more misleading.

Nonlinear Regression

Log(Bilirubin)= -3.96 + 3.47 log(PT) + 0.211 log(ALT)

is a nonlinear model in terms of PT and ALT

but is a linear model in terms of log PT, log ALT and

the regression coefficients b0=-3.96, b1=3.47 and b2=0.211.

Consider model of the form: Ŷ= Drug concentration = b1 10 b2 X

This is nonlinear in b2 but can be made linear with a transformation: log10(concentration)=log10(b1) + b2 x

What about: Drug concentration = b0+ b1 10 b2 X

This model is nonlinear in b2 and cannot be transformed.

Nonlinear regression software is needed to estimate b0, b1, & b2.

Nonlinear ExampleCompartmental drug models

Model of how drug (or any chemical) is metabolized by an organism. Y1=concentration in serum, Y2=concentration in organ, x=time

d (Y1)/dx = -b1 Y1

d (Y2)/dx = b1Y1 - b2Y2

b1 > b2 > 0

solutions: Y1 = constant e -b1 x

Y2 = (b1/(b1-b2)) [e -b2x – e -b1x] < - model

Y2 takes on a maximum value when x = ln(b1/b2)/[b1-b2]Y2 is zero when x=0 or x is very largeThe constants b1 and b2 are rates. They are in units of 1/x (i.e 1/time).

Y1 serum

Y2

organ

Nonlinear equation

Y = (b1/(b1-b2)) [e -b2x – e -b1x]

t

Y

Y= [0.0967/(0.0967-0.0506)]*[exp(-0.0506*t)-exp(-0.0967*t)] at peak, t = 14, Ŷ= 0.49

Residual diagnostics & “model criticism” Assumptions of linear regression: 1. Linear relation between Y and each X except

for random “noise” (but can transform X). 2. Effect of each X is additive (but can make

interaction terms) 3. Errors (e) have constant variance and come

from a Gaussian distribution 4. All observations from the same population 5. All observations independent (usually ok)

A plot of Ŷ versus e, called a residual error (diagnostic) plot, can help verify if these assumptions are met.

“good” residual plot

Residual plots – diagnostics- “bad” residual plots

Regression diagnostics

Problem-outliersSolution – Find on residual plot & eliminate Problem-Curvilinear (non linear)Solution- try nonlinear trans formation (x2, 1/x, log(x), ex) Problem-Errors not GaussianSolution-Robust regression (future class)

Problem-Non constant SDe or Var(e)Solution-Weights (future class)

Adjusted means

-12.0

-10.0

-8.0

-6.0

-4.0

-2.0

0.0

2.0

4.0

20 25 30 35 40 45 50 55 60 65 70

dietary fat (g)

bo

dy

fat

chg

(%

)

mediationshamunadjusted meansadjusted means

unadjusted difference

adjusted difference

Ex: Meditation & change in pct body fat Overweight persons chose a meditation program or a “sham” (lectures) as

part of a weight loss effort. They were NOT randomized.

Change in percent body fat by treatment group (mediation or sham) over three months

Unadjusted Means Level n Mean pct body SEM Mean dietary fat (gm) fat change (before study start) 1-meditate 439 -7.51% 0.47% 32.7 g 2-sham 704 1.34% 0.35% 67.1 g

Unadjusted Mean difference (sham - meditation) = 8.85% SE of the difference = SEd = √0.47222 + 0.35322 = 0.59%

t = mean diff/SEd = 8.85% / 0.59% = 15.1, p < 0.0001

Overall unweighted dietary fat = 49.9 g

Result via “regression”

Y= change in body fat,

X = 1 if sham, 0 if meditation

Y = a + b X + error = a + b sham + error

term estimate SE t p value

a -7.51 0.46 -16.4 < 0.001

b 8.85 0.59 15.2 < 0.001

Ŷ = -7.51 + 8.85 sham

Regr-control for dietary fat pct body fat= Y = a + b1 X1 + b2 X2 + error

X1= 1 if sham, 0 if meditation

X2 = dietary fat in g

term estimate SE t p value

a -14.5 0.54 -26.6 < 0.001

b1 1.51 0.56 2.7 0.007

b2 0.21 0.011 18.9 <0.001

body fat chg=-14.5 +1.51 sham +0.21 diet fat

Adjusted meansPlug into equation for X1=1=sham or X1=0.

Hold X2 = diet fat= 49.9 g – overall mean X2

Med: -14.5 + 1.51(0)+0.213(49.9)=-3.84%

Sham:-14.5 + 1.51(1)+0.213(49.9)=-2.33%

Adj mean difference =2.33-(-3.84) =1.51%

(Unadjusted was 8.85%)

General procedure

Adjusted means for X1,controlling for (confounders) X2, X3, X4 …

1. Estimate model regression coefficients

2. Plug in different values for X1, and values for all other Xs held constant at their overall means.

Gives adjusted means and their SEs.Assumes ????