Logistic Regression II

48
Logistic Regression II Logistic Regression II

description

Logistic Regression II. Exposure=1. Exposure=0. Disease = 1. Disease = 0. Simple 2x2 Table (courtesy Hosmer and Lemeshow ). Odds Ratio for simple 2x2 Table. (courtesy Hosmer and Lemeshow ). =>55 yrs.

Transcript of Logistic Regression II

Page 1: Logistic Regression II

Logistic Regression II Logistic Regression II

Page 2: Logistic Regression II

Simple 2x2 Table Simple 2x2 Table (courtesy (courtesy Hosmer and LemeshowHosmer and Lemeshow))

 

Exposure=1 Exposure=0

 Disease = 1

Disease = 0 

1

1

1)/(

e

eEDP

e

eEDP

1)~/(

11

1)/(~

eEDP

eEDP

1

1)~/(~

Page 3: Logistic Regression II

e

e

e

ee

e

OR

11

11

1

11

1

1

1

(courtesy (courtesy Hosmer and LemeshowHosmer and Lemeshow))

Odds Ratio for simple 2x2 Table Odds Ratio for simple 2x2 Table

e

e 111 )( ee

Page 4: Logistic Regression II

Example 1: CHD and Age Example 1: CHD and Age (2x2)(2x2)

(from Hosmer and Lemeshow) (from Hosmer and Lemeshow)

 

=>55 yrs <55 years

 CHD Present

CHD Absent 

21 22

6 51

Page 5: Logistic Regression II

Example 1: CHD and Age Example 1: CHD and Age (2x2)(2x2)

(from Hosmer and Lemeshow) (from Hosmer and Lemeshow)

 

=>55 yrs <55 years

 CHD Present

CHD Absent 

21 22

6 51

Page 6: Logistic Regression II

(younger) unexposed if 0

(older) exposed if 1

))(1

)(log(

1

11

X

XDP

DP

The Logit ModelThe Logit Model

Page 7: Logistic Regression II

51226211 )

1

1()

1()

1

1()

1(),(

11

1

e

xe

ex

ex

e

eL

The LikelihoodThe Likelihood

Page 8: Logistic Regression II

The Log LikelihoodThe Log Likelihood

1111 loglogloglog

:

eeeee

recall

)1log(510)1log(2222

)1log(60)1log(21)(21

),(log

111

1

ee

ee

L

51226211 )

1

1()

1()

1

1()

1(),(

11

1

e

xe

ex

ex

e

eL

Page 9: Logistic Regression II

Derivative(s) of the log Derivative(s) of the log likelihoodlikelihood

1

1

1

1

1

6

1

2121

)]([log

1

1

e

e

e

e

d

Ld

e

e

e

e

d

Ld

1

51

1

2222

)]([log

)1log(510)1log(2222

)1log(60)1log(21)(21

),(log

111

1

ee

ee

L

Page 10: Logistic Regression II

Maximize Maximize

51

22

5122

73)1(22

1

7322

01

51

1

2222

e

e

ee

e

e

e

e

e

e

=Odds of disease in the unexposed (<55)

Page 11: Logistic Regression II

Maximize Maximize 11

ORx

xe

e

e

e

ee

e

e

226

5121

5122

621

621

6

21

216

)1(2127

01

2721

1

1

1

11

1

1

Page 12: Logistic Regression II

Hypothesis TestingHypothesis Testing H H00: : =0=0

2. The Likelihood Ratio test:

1. The Wald test:

)ˆ(error standard asymptotic

Z

2~))](ln(2[))(ln(2

)(

)(ln2

pfullLreducedL

fullL

reducedL

Reduced=reduced model with k parameters; Full=full model with k+p parameters

Null value of beta is 0 (no association)

Page 13: Logistic Regression II

Hypothesis TestingHypothesis Testing H H00: : =0=0

2. What is the Likelihood Ratio test here?– Full model = includes age variable– Reduced model = includes only intercept

Maximum likelihood for reduced model ought to be (.43)43x(.57)57

(57 cases/43 controls)…does MLE yield this?…

96.3

221

211

61

511

)2262151

ln(

x

x

Z

1. What is the Wald Test here?

Page 14: Logistic Regression II

))(1

)(log(

DP

DP

The Reduced ModelThe Reduced Model

Page 15: Logistic Regression II

Likelihood value for reduced modelLikelihood value for reduced model

28.)75ln(.

75.57

43

5743

1004343

01

10043

)(log

)1(57)1(43log43)(log

)1

1()

1()( 5743

e

e

ee

e

e

d

Ld

eeeL

ex

e

eL

= marginal odds of CHD!

305743

5743

101.2)57(.)43(.

)75.1

1()

75.1

75.()28.(

xx

xL

Page 16: Logistic Regression II

Likelihood value of full modelLikelihood value of full model

265122621

51226211

1043.2)43.1

1()

43.1

43.()

5.4

1()

5.4

5.3(

)

5122

1

1()

5122

1

5122

()

621

1

1()

621

1

621

()(

xxxx

xxxL

Page 17: Logistic Regression II

Finally the LR…Finally the LR…

2

2630

)96.3(7.18

7.1896.1177.136)]1043.2ln(2[)101.2ln(2

)(

)(ln2

xx

fullL

reducedL

Page 18: Logistic Regression II

Example 2: Example 2: >2 exposure levels>2 exposure levels*(dummy coding) *(dummy coding)

CHD status

White Black Hispanic Other

Present 5 20 15 10

Absent 20 10 10 10

(From Hosmer and Lemeshow)

Page 19: Logistic Regression II

SAS CODESAS CODEdata race;

input chd race_2 race_3 race_4 number;datalines;

0 0 0 0 201 0 0 0 50 1 0 0 101 1 0 0 200 0 1 0 101 0 1 0 150 0 0 1 101 0 0 1 10end;run;

proc logistic data=race descending;weight number;model chd = race_2 race_3 race_4;

run;

Note the use of “dummy variables.”

“Baseline” category is white here.

Page 20: Logistic Regression II

What’s the likelihood here?What’s the likelihood here?

10101015

1020205

)1

1()

1()

1

1()

1( x

)1

1()

1()

1

1()

1()(

otherwhiteotherwhite

otherwhite

hispwhitehispwhite

hispwhite

blackwhiteblackwhite

blackwhite

whitewhite

white

ex

e

e

ex

e

e

ex

e

ex

ex

e

eL

β

In this case there is more than one unknown beta

(regression coefficient)—so this symbol represents a vector of beta coefficients.

Page 21: Logistic Regression II

SAS OUTPUT – model fitSAS OUTPUT – model fit 

Intercept Intercept and Criterion Only Covariates  AIC 140.629 132.587 SC 140.709 132.905 -2 Log L 138.629 124.587   Testing Global Null Hypothesis: BETA=0  Test Chi-Square DF Pr > ChiSq  Likelihood Ratio 14.0420 3 0.0028 Score 13.3333 3 0.0040 Wald 11.7715 3 0.0082

Page 22: Logistic Regression II

SAS OUTPUT – regression SAS OUTPUT – regression coefficientscoefficients

Analysis of Maximum Likelihood Estimates  Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq  Intercept 1 -1.3863 0.5000 7.6871 0.0056 race_2 1 2.0794 0.6325 10.8100 0.0010 race_3 1 1.7917 0.6455 7.7048 0.0055 race_4 1 1.3863 0.6708 4.2706 0.0388

Page 23: Logistic Regression II

SAS output – OR estimatesSAS output – OR estimates The LOGISTIC Procedure  Odds Ratio Estimates  Point 95% Wald Effect Estimate Confidence Limits  race_2 8.000 2.316 27.633 race_3 6.000 1.693 21.261 race_4 4.000 1.074 14.895

Interpretation:

8x increase in odds of CHD for black vs. white

6x increase in odds of CHD for hispanic vs. white

4x increase in odds of CHD for other vs. white

Page 24: Logistic Regression II

Example 3: Prostrate Cancer Study Example 3: Prostrate Cancer Study (same data as from lab 3)(same data as from lab 3)

Question: Does PSA level predict tumor penetration into the prostatic capsule (yes/no)? (this is a bad outcome, meaning tumor has spread).

Is this association confounded by race?

Does race modify this association (interaction)?

Page 25: Logistic Regression II

1.1. What’s the relationship What’s the relationship between PSA (continuous between PSA (continuous variable) and capsule variable) and capsule penetration (binary)?penetration (binary)?

Page 26: Logistic Regression II

Capsule (yes/no) vs. PSA (mg/ml)Capsule (yes/no) vs. PSA (mg/ml)psa vs. capsule

capsule

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

psa0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

Page 27: Logistic Regression II

Mean PSA per quintile vs. proportion capsule=yes S-shaped?

proportion with

capsule=yes

0.180.200.220.240.260.280.300.320.340.360.380.400.420.440.460.480.500.520.540.560.580.600.620.640.660.680.70

PSA (mg/ml)0 10 20 30 40 50

Page 28: Logistic Regression II

logit plot of psa predicting capsule, by quintiles linear in the logit?

Page 29: Logistic Regression II

logit plot of psa predicting capsule, by QUARTILE linear in the logit?

Page 30: Logistic Regression II

logit plot of psa predicting capsule, by decile linear in the logit?

Page 31: Logistic Regression II

model: capsule = psamodel: capsule = psa  

Testing Global Null Hypothesis: BETA=0  Test Chi-Square DF Pr > ChiSq  Likelihood Ratio 49.1277 1 <.0001 Score 41.7430 1 <.0001 Wald 29.4230 1 <.0001   Analysis of Maximum Likelihood Estimates  Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq  Intercept 1 -1.1137 0.1616 47.5168 <.0001 psa 1 0.0502 0.00925 29.4230 <.0001

Page 32: Logistic Regression II

Model: capsule = psa raceModel: capsule = psa race Analysis of Maximum Likelihood Estimates   Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq   Intercept 1 -0.4992 0.4581 1.1878 0.2758 psa 1 0.0512 0.00949 29.0371 <.0001 race 1 -0.5788 0.4187 1.9111 0.1668

No indication of confounding by race since the regression coefficient is not changed in magnitude.

Page 33: Logistic Regression II

Model: Model: capsule = psa race psa*racecapsule = psa race psa*race

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq   Intercept 1 -1.2858 0.6247 4.2360 0.0396 psa 1 0.0608 0.0280 11.6952 0.0006 race 1 0.0954 0.5421 0.0310 0.8603

psa*race 1 -0.0349 0.0193 3.2822 0.0700

Evidence of effect modification by race (p=.07).

Page 34: Logistic Regression II

---------------------------- race=0 ---------------------------- 

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq  Intercept 1 -1.1904 0.1793 44.0820 <.0001 psa 1 0.0608 0.0117 26.9250 <.0001  ---------------------------- race=1 ---------------------------- Analysis of Maximum Likelihood Estimates  Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq  Intercept 1 -1.0950 0.5116 4.5812 0.0323 psa 1 0.0259 0.0153 2.8570 0.0910

STRATIFIED BY RACE:

Page 35: Logistic Regression II

How to calculate ORs from How to calculate ORs from model with interaction termmodel with interaction term

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq   Intercept 1 -1.2858 0.6247 4.2360 0.0396 psa 1 0.0608 0.0280 11.6952 0.0006 race 1 0.0954 0.5421 0.0310 0.8603

psa*race 1 -0.0349 0.0193 3.2822 0.0700

Increased odds for every 5 mg/ml increase in PSA:

If white (race=0):

If black (race=1):

36.1)0608.*5( e

14.1))0349.0608*(.5( e

Page 36: Logistic Regression II

How to calculate ORs from How to calculate ORs from model with interaction termmodel with interaction term

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq   Intercept 1 -1.2858 0.6247 4.2360 0.0396 psa 1 0.0608 0.0280 11.6952 0.0006 race 1 0.0954 0.5421 0.0310 0.8603

psa*race 1 -0.0349 0.0193 3.2822 0.0700

Increased odds for every 5 mg/ml increase in PSA:

If white (race=0):

If black (race=1):

36.1)0608.*5( e

14.1))0349.0608*(.5( e

Page 37: Logistic Regression II

ORs for increasing psa at ORs for increasing psa at different levels of race.different levels of race.

30.1e

:menblack among level psain increase mg/ml 10 afor OR

14.1e

:menblack among level psain increase mg/ml 5 afor

82.1e

:men whiteamong level psain increase mg/ml 10 afor OR

36.1e

:men whiteamong level psain increase mg/ml 5 afor

)0349.0608(.100)(*0349.*(1)0954.*(0)0608.

)1*01(*0349.*(1)0954.*(5)0608.

5*0349.5*0608.)1*0(*0349.*(1)0954.*(0)0608.

1)*5(*0349.*(1)0954.*(5)0608.

10*0608.0)(*0349.*(0)0954.*(0)0608.

0)(*0349.*(0)0954.*(10)0608.

5*0608.0)(*0349.*(0)0954.*(0)0608.

0)(*0349.*(0)0954.*(5)0608.

ee

ee

OR

ee

ee

OR

Page 38: Logistic Regression II

ORs for increasing psa at ORs for increasing psa at different levels of race.different levels of race.

30.1e

:menblack among level psain increase mg/ml 10 afor OR

14.1e

:menblack among level psain increase mg/ml 5 afor

82.1e

:men whiteamong level psain increase mg/ml 10 afor OR

36.1e

:men whiteamong level psain increase mg/ml 5 afor

)0349.0608(.100)(*0349.*(1)0954.*(0)0608.

)1*01(*0349.*(1)0954.*(5)0608.

5*0349.5*0608.)1*0(*0349.*(1)0954.*(0)0608.

1)*5(*0349.*(1)0954.*(5)0608.

10*0608.0)(*0349.*(0)0954.*(0)0608.

0)(*0349.*(0)0954.*(10)0608.

5*0608.0)(*0349.*(0)0954.*(0)0608.

0)(*0349.*(0)0954.*(5)0608.

ee

ee

OR

ee

ee

OR

Page 39: Logistic Regression II

OR for being black (vs. white), at OR for being black (vs. white), at different levels of psa.different levels of psa.

10.1e

:mg/ml 0psamen with among white)(vs.black beingfor

06.1e

:mg/ml 1psamen with among white)(vs.black beingfor

19.0e

:mg/ml 05psamen with among white)(vs.black beingfor

034.0e

:mg/ml 100psamen with among white)(vs.black beingfor

*(1)0954.0)(*0349.*(0)0954.*(0)0608.

0)(*0349.*(1)0954.*(0)0608.

1)(*0349.*(1)0954.0)(*0349.*(0)0954.*(1)0608.

1)(*0349.*(1)0954.*(1)0608.

50)*1(*0349.*(1)0954.0)(*0349.*(0)0954.*(50)0608.

50)*1(*0349.*(1)0954.*(50)0608.

100)*1(*0349.*(1)0954.100)*0(*0349.*(0)0954.*(100)0608.

100)*1(*0349.*(1)0954.*(100)0608.

ee

OR

ee

OR

ee

OR

ee

OR

Page 40: Logistic Regression II

PredictionsPredictionsThe model:

What’s the predicted probability for a white man with psa level of 10 mg/ml?

)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa

34.51.1

51.

1

110)psa1/white,P(capsule

11)P(capsule

1)P(capsule-1

1)P(capsule

)*(0349.)(0954.)(0608.2858.1 )1)P(capsule-1

1)P(capsuleln(

)10(0608.2858.1

)10(0608.2858.1

)0(0349.)0(0954.)10(0608.2858.1

)0(0349.)0(0954.)10(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

e

e

e

e

e

e

e

racepsaracepsa

e

racepsaracepsa

racepsaracepsa

racepsaracepsa

Page 41: Logistic Regression II

PredictionsPredictionsThe model:

What’s the predicted probability for a black man with psa level of 10 mg/ml?

)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa

28.39.1

39.

110)psa1/black,P(capsule

11)P(capsule

)10(0349.)1(0954.)10(0608.2858.1

)10(0349.)1(0954.)10(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

e

e

e

eracepsaracepsa

racepsaracepsa

Page 42: Logistic Regression II

PredictionsPredictionsThe model:

What’s the predicted probability for a white man with psa level of 0 mg/ml (reference group)?

)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa

22.28.1

28.

110)psa1/black,P(capsule

11)P(capsule

2858.1

2858.1

)*(0349.)(0954.)(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

e

e

e

eracepsaracepsa

racepsaracepsa

Page 43: Logistic Regression II

PredictionsPredictionsThe model:

What’s the predicted probability for a black man with psa level of 0 mg/ml?

)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa

23.30.1

30.

110)psa1/black,P(capsule

11)P(capsule

)1(0954.2858.1

)1(0954.2858.1

)*(0349.)(0954.)(0608.2858.1

)*(0349.)(0954.)(0608.2858.1

e

e

e

eracepsaracepsa

racepsaracepsa

Page 44: Logistic Regression II

Diagnostics: ResidualsDiagnostics: Residuals

What’s a residual in the context of logistic regression?

Residual=observed-predicted

For logistic regression:

residual= 1 – predicted probability

OR residual = 0 – predicted probability

Page 45: Logistic Regression II

Diagnostics: ResidualsDiagnostics: Residuals

88.22.1Residual

22.22.0Residual

What’s the residual for a white man with psa level of 0 mg/ml who has capsule penetration?

What’s the residual for a white man with psa level of 0 mg/ml who does not have capsule penetration?

Page 46: Logistic Regression II

In SAS…recall model with psa In SAS…recall model with psa and gleason…and gleason…

proc logistic data = hrp261.psa;

model capsule (event="1") = psa gleason;

output out=MyOutdata l=MyLowerCI

p=Mypredicted u=MyUpperCI resdev=Myresiduals;

run;

proc gplot data = MyOutdata;

plot Myresiduals*predictor;

run;

Page 47: Logistic Regression II

Residual*psaResidual*psaDevi ance Res i dual

- 3

- 2

- 1

0

1

2

3

psa

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

Page 48: Logistic Regression II

Estimated prob*gleasonEstimated prob*gleasonEs t i mat ed Pr obabi l i t y

0. 0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

gl eason

0 1 2 3 4 5 6 7 8 9