Logistic Regression II
description
Transcript of Logistic Regression II
Logistic Regression II Logistic Regression II
Simple 2x2 Table Simple 2x2 Table (courtesy (courtesy Hosmer and LemeshowHosmer and Lemeshow))
Exposure=1 Exposure=0
Disease = 1
Disease = 0
1
1
1)/(
e
eEDP
e
eEDP
1)~/(
11
1)/(~
eEDP
eEDP
1
1)~/(~
e
e
e
ee
e
OR
11
11
1
11
1
1
1
(courtesy (courtesy Hosmer and LemeshowHosmer and Lemeshow))
Odds Ratio for simple 2x2 Table Odds Ratio for simple 2x2 Table
e
e 111 )( ee
Example 1: CHD and Age Example 1: CHD and Age (2x2)(2x2)
(from Hosmer and Lemeshow) (from Hosmer and Lemeshow)
=>55 yrs <55 years
CHD Present
CHD Absent
21 22
6 51
Example 1: CHD and Age Example 1: CHD and Age (2x2)(2x2)
(from Hosmer and Lemeshow) (from Hosmer and Lemeshow)
=>55 yrs <55 years
CHD Present
CHD Absent
21 22
6 51
(younger) unexposed if 0
(older) exposed if 1
))(1
)(log(
1
11
X
XDP
DP
The Logit ModelThe Logit Model
51226211 )
1
1()
1()
1
1()
1(),(
11
1
e
xe
ex
ex
e
eL
The LikelihoodThe Likelihood
The Log LikelihoodThe Log Likelihood
1111 loglogloglog
:
eeeee
recall
)1log(510)1log(2222
)1log(60)1log(21)(21
),(log
111
1
ee
ee
L
51226211 )
1
1()
1()
1
1()
1(),(
11
1
e
xe
ex
ex
e
eL
Derivative(s) of the log Derivative(s) of the log likelihoodlikelihood
1
1
1
1
1
6
1
2121
)]([log
1
1
e
e
e
e
d
Ld
e
e
e
e
d
Ld
1
51
1
2222
)]([log
)1log(510)1log(2222
)1log(60)1log(21)(21
),(log
111
1
ee
ee
L
Maximize Maximize
51
22
5122
73)1(22
1
7322
01
51
1
2222
e
e
ee
e
e
e
e
e
e
=Odds of disease in the unexposed (<55)
Maximize Maximize 11
ORx
xe
e
e
e
ee
e
e
226
5121
5122
621
621
6
21
216
)1(2127
01
2721
1
1
1
11
1
1
Hypothesis TestingHypothesis Testing H H00: : =0=0
2. The Likelihood Ratio test:
1. The Wald test:
)ˆ(error standard asymptotic
0ˆ
Z
2~))](ln(2[))(ln(2
)(
)(ln2
pfullLreducedL
fullL
reducedL
Reduced=reduced model with k parameters; Full=full model with k+p parameters
Null value of beta is 0 (no association)
Hypothesis TestingHypothesis Testing H H00: : =0=0
2. What is the Likelihood Ratio test here?– Full model = includes age variable– Reduced model = includes only intercept
Maximum likelihood for reduced model ought to be (.43)43x(.57)57
(57 cases/43 controls)…does MLE yield this?…
96.3
221
211
61
511
)2262151
ln(
x
x
Z
1. What is the Wald Test here?
))(1
)(log(
DP
DP
The Reduced ModelThe Reduced Model
Likelihood value for reduced modelLikelihood value for reduced model
28.)75ln(.
75.57
43
5743
1004343
01
10043
)(log
)1(57)1(43log43)(log
)1
1()
1()( 5743
e
e
ee
e
e
d
Ld
eeeL
ex
e
eL
= marginal odds of CHD!
305743
5743
101.2)57(.)43(.
)75.1
1()
75.1
75.()28.(
xx
xL
Likelihood value of full modelLikelihood value of full model
265122621
51226211
1043.2)43.1
1()
43.1
43.()
5.4
1()
5.4
5.3(
)
5122
1
1()
5122
1
5122
()
621
1
1()
621
1
621
()(
xxxx
xxxL
Finally the LR…Finally the LR…
2
2630
)96.3(7.18
7.1896.1177.136)]1043.2ln(2[)101.2ln(2
)(
)(ln2
xx
fullL
reducedL
Example 2: Example 2: >2 exposure levels>2 exposure levels*(dummy coding) *(dummy coding)
CHD status
White Black Hispanic Other
Present 5 20 15 10
Absent 20 10 10 10
(From Hosmer and Lemeshow)
SAS CODESAS CODEdata race;
input chd race_2 race_3 race_4 number;datalines;
0 0 0 0 201 0 0 0 50 1 0 0 101 1 0 0 200 0 1 0 101 0 1 0 150 0 0 1 101 0 0 1 10end;run;
proc logistic data=race descending;weight number;model chd = race_2 race_3 race_4;
run;
Note the use of “dummy variables.”
“Baseline” category is white here.
What’s the likelihood here?What’s the likelihood here?
10101015
1020205
)1
1()
1()
1
1()
1( x
)1
1()
1()
1
1()
1()(
otherwhiteotherwhite
otherwhite
hispwhitehispwhite
hispwhite
blackwhiteblackwhite
blackwhite
whitewhite
white
ex
e
e
ex
e
e
ex
e
ex
ex
e
eL
β
In this case there is more than one unknown beta
(regression coefficient)—so this symbol represents a vector of beta coefficients.
SAS OUTPUT – model fitSAS OUTPUT – model fit
Intercept Intercept and Criterion Only Covariates AIC 140.629 132.587 SC 140.709 132.905 -2 Log L 138.629 124.587 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 14.0420 3 0.0028 Score 13.3333 3 0.0040 Wald 11.7715 3 0.0082
SAS OUTPUT – regression SAS OUTPUT – regression coefficientscoefficients
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.3863 0.5000 7.6871 0.0056 race_2 1 2.0794 0.6325 10.8100 0.0010 race_3 1 1.7917 0.6455 7.7048 0.0055 race_4 1 1.3863 0.6708 4.2706 0.0388
SAS output – OR estimatesSAS output – OR estimates The LOGISTIC Procedure Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits race_2 8.000 2.316 27.633 race_3 6.000 1.693 21.261 race_4 4.000 1.074 14.895
Interpretation:
8x increase in odds of CHD for black vs. white
6x increase in odds of CHD for hispanic vs. white
4x increase in odds of CHD for other vs. white
Example 3: Prostrate Cancer Study Example 3: Prostrate Cancer Study (same data as from lab 3)(same data as from lab 3)
Question: Does PSA level predict tumor penetration into the prostatic capsule (yes/no)? (this is a bad outcome, meaning tumor has spread).
Is this association confounded by race?
Does race modify this association (interaction)?
1.1. What’s the relationship What’s the relationship between PSA (continuous between PSA (continuous variable) and capsule variable) and capsule penetration (binary)?penetration (binary)?
Capsule (yes/no) vs. PSA (mg/ml)Capsule (yes/no) vs. PSA (mg/ml)psa vs. capsule
capsule
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
psa0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
Mean PSA per quintile vs. proportion capsule=yes S-shaped?
proportion with
capsule=yes
0.180.200.220.240.260.280.300.320.340.360.380.400.420.440.460.480.500.520.540.560.580.600.620.640.660.680.70
PSA (mg/ml)0 10 20 30 40 50
logit plot of psa predicting capsule, by quintiles linear in the logit?
logit plot of psa predicting capsule, by QUARTILE linear in the logit?
logit plot of psa predicting capsule, by decile linear in the logit?
model: capsule = psamodel: capsule = psa
Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 49.1277 1 <.0001 Score 41.7430 1 <.0001 Wald 29.4230 1 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.1137 0.1616 47.5168 <.0001 psa 1 0.0502 0.00925 29.4230 <.0001
Model: capsule = psa raceModel: capsule = psa race Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.4992 0.4581 1.1878 0.2758 psa 1 0.0512 0.00949 29.0371 <.0001 race 1 -0.5788 0.4187 1.9111 0.1668
No indication of confounding by race since the regression coefficient is not changed in magnitude.
Model: Model: capsule = psa race psa*racecapsule = psa race psa*race
Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.2858 0.6247 4.2360 0.0396 psa 1 0.0608 0.0280 11.6952 0.0006 race 1 0.0954 0.5421 0.0310 0.8603
psa*race 1 -0.0349 0.0193 3.2822 0.0700
Evidence of effect modification by race (p=.07).
---------------------------- race=0 ----------------------------
Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.1904 0.1793 44.0820 <.0001 psa 1 0.0608 0.0117 26.9250 <.0001 ---------------------------- race=1 ---------------------------- Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.0950 0.5116 4.5812 0.0323 psa 1 0.0259 0.0153 2.8570 0.0910
STRATIFIED BY RACE:
How to calculate ORs from How to calculate ORs from model with interaction termmodel with interaction term
Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.2858 0.6247 4.2360 0.0396 psa 1 0.0608 0.0280 11.6952 0.0006 race 1 0.0954 0.5421 0.0310 0.8603
psa*race 1 -0.0349 0.0193 3.2822 0.0700
Increased odds for every 5 mg/ml increase in PSA:
If white (race=0):
If black (race=1):
36.1)0608.*5( e
14.1))0349.0608*(.5( e
How to calculate ORs from How to calculate ORs from model with interaction termmodel with interaction term
Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.2858 0.6247 4.2360 0.0396 psa 1 0.0608 0.0280 11.6952 0.0006 race 1 0.0954 0.5421 0.0310 0.8603
psa*race 1 -0.0349 0.0193 3.2822 0.0700
Increased odds for every 5 mg/ml increase in PSA:
If white (race=0):
If black (race=1):
36.1)0608.*5( e
14.1))0349.0608*(.5( e
ORs for increasing psa at ORs for increasing psa at different levels of race.different levels of race.
30.1e
:menblack among level psain increase mg/ml 10 afor OR
14.1e
:menblack among level psain increase mg/ml 5 afor
82.1e
:men whiteamong level psain increase mg/ml 10 afor OR
36.1e
:men whiteamong level psain increase mg/ml 5 afor
)0349.0608(.100)(*0349.*(1)0954.*(0)0608.
)1*01(*0349.*(1)0954.*(5)0608.
5*0349.5*0608.)1*0(*0349.*(1)0954.*(0)0608.
1)*5(*0349.*(1)0954.*(5)0608.
10*0608.0)(*0349.*(0)0954.*(0)0608.
0)(*0349.*(0)0954.*(10)0608.
5*0608.0)(*0349.*(0)0954.*(0)0608.
0)(*0349.*(0)0954.*(5)0608.
ee
ee
OR
ee
ee
OR
ORs for increasing psa at ORs for increasing psa at different levels of race.different levels of race.
30.1e
:menblack among level psain increase mg/ml 10 afor OR
14.1e
:menblack among level psain increase mg/ml 5 afor
82.1e
:men whiteamong level psain increase mg/ml 10 afor OR
36.1e
:men whiteamong level psain increase mg/ml 5 afor
)0349.0608(.100)(*0349.*(1)0954.*(0)0608.
)1*01(*0349.*(1)0954.*(5)0608.
5*0349.5*0608.)1*0(*0349.*(1)0954.*(0)0608.
1)*5(*0349.*(1)0954.*(5)0608.
10*0608.0)(*0349.*(0)0954.*(0)0608.
0)(*0349.*(0)0954.*(10)0608.
5*0608.0)(*0349.*(0)0954.*(0)0608.
0)(*0349.*(0)0954.*(5)0608.
ee
ee
OR
ee
ee
OR
OR for being black (vs. white), at OR for being black (vs. white), at different levels of psa.different levels of psa.
10.1e
:mg/ml 0psamen with among white)(vs.black beingfor
06.1e
:mg/ml 1psamen with among white)(vs.black beingfor
19.0e
:mg/ml 05psamen with among white)(vs.black beingfor
034.0e
:mg/ml 100psamen with among white)(vs.black beingfor
*(1)0954.0)(*0349.*(0)0954.*(0)0608.
0)(*0349.*(1)0954.*(0)0608.
1)(*0349.*(1)0954.0)(*0349.*(0)0954.*(1)0608.
1)(*0349.*(1)0954.*(1)0608.
50)*1(*0349.*(1)0954.0)(*0349.*(0)0954.*(50)0608.
50)*1(*0349.*(1)0954.*(50)0608.
100)*1(*0349.*(1)0954.100)*0(*0349.*(0)0954.*(100)0608.
100)*1(*0349.*(1)0954.*(100)0608.
ee
OR
ee
OR
ee
OR
ee
OR
PredictionsPredictionsThe model:
What’s the predicted probability for a white man with psa level of 10 mg/ml?
)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa
34.51.1
51.
1
110)psa1/white,P(capsule
11)P(capsule
1)P(capsule-1
1)P(capsule
)*(0349.)(0954.)(0608.2858.1 )1)P(capsule-1
1)P(capsuleln(
)10(0608.2858.1
)10(0608.2858.1
)0(0349.)0(0954.)10(0608.2858.1
)0(0349.)0(0954.)10(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
e
e
e
e
e
e
e
racepsaracepsa
e
racepsaracepsa
racepsaracepsa
racepsaracepsa
PredictionsPredictionsThe model:
What’s the predicted probability for a black man with psa level of 10 mg/ml?
)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa
28.39.1
39.
110)psa1/black,P(capsule
11)P(capsule
)10(0349.)1(0954.)10(0608.2858.1
)10(0349.)1(0954.)10(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
e
e
e
eracepsaracepsa
racepsaracepsa
PredictionsPredictionsThe model:
What’s the predicted probability for a white man with psa level of 0 mg/ml (reference group)?
)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa
22.28.1
28.
110)psa1/black,P(capsule
11)P(capsule
2858.1
2858.1
)*(0349.)(0954.)(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
e
e
e
eracepsaracepsa
racepsaracepsa
PredictionsPredictionsThe model:
What’s the predicted probability for a black man with psa level of 0 mg/ml?
)*(0349.)(0954.)(0608.2858.1 1)(capsulelogit racepsaracepsa
23.30.1
30.
110)psa1/black,P(capsule
11)P(capsule
)1(0954.2858.1
)1(0954.2858.1
)*(0349.)(0954.)(0608.2858.1
)*(0349.)(0954.)(0608.2858.1
e
e
e
eracepsaracepsa
racepsaracepsa
Diagnostics: ResidualsDiagnostics: Residuals
What’s a residual in the context of logistic regression?
Residual=observed-predicted
For logistic regression:
residual= 1 – predicted probability
OR residual = 0 – predicted probability
Diagnostics: ResidualsDiagnostics: Residuals
88.22.1Residual
22.22.0Residual
What’s the residual for a white man with psa level of 0 mg/ml who has capsule penetration?
What’s the residual for a white man with psa level of 0 mg/ml who does not have capsule penetration?
In SAS…recall model with psa In SAS…recall model with psa and gleason…and gleason…
proc logistic data = hrp261.psa;
model capsule (event="1") = psa gleason;
output out=MyOutdata l=MyLowerCI
p=Mypredicted u=MyUpperCI resdev=Myresiduals;
run;
proc gplot data = MyOutdata;
plot Myresiduals*predictor;
run;
Residual*psaResidual*psaDevi ance Res i dual
- 3
- 2
- 1
0
1
2
3
psa
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
Estimated prob*gleasonEstimated prob*gleasonEs t i mat ed Pr obabi l i t y
0. 0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1. 0
gl eason
0 1 2 3 4 5 6 7 8 9