Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial...
-
Upload
betty-lynch -
Category
Documents
-
view
217 -
download
1
Transcript of Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial...
![Page 1: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/1.jpg)
Logistic Regression
![Page 2: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/2.jpg)
Aims
• When and Why do we Use Logistic Regression?
– Binary– Multinomial
• Theory Behind Logistic Regression– Assessing the Model– Assessing predictors– Things that can go Wrong
• Interpreting Logistic Regression
Slide 2
![Page 3: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/3.jpg)
When And Why
To predict an dicotomous variable from one or more categorical or continuous predictor variables.
Slide 3
![Page 4: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/4.jpg)
Model
)110(11)(
iXbbeYP
)110(11)(
iXbbeYP
Slide 4
)...22110(11)(
inXnbXbXbbeYP
)...22110(11)(
inXnbXbXbbeYP
![Page 5: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/5.jpg)
Assessing the Model
• The Log-likelihood statistic– Analogous to the residual sum of squares in
multiple regression– It is an indicator of how much unexplained
information there is after the model has been fitted.
– Large values indicate poorly fitting statistical models.
N
1 i
1ln1ln likelihoodlog iiii YPYYPY
![Page 6: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/6.jpg)
Assessing Changes in Models
It’s possible to calculate a log-likelihood for different models and to compare these models by looking at the difference between their log-likelihoods.
)()(22 BaselineLLNewLL
baselinenew kkdf
![Page 7: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/7.jpg)
Assessing Predictors: The Wald Statistic
Similar to t-statistic in Regression. Tests the null hypothesis that b = 0. Is biased when b is large. Better to look at Likelihood-ratio
statistics.
bSEbWald bSE
bWald
Slide 7
![Page 8: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/8.jpg)
Assessing Predictors: The Odds Ratio or Exp(b)
Indicates the change in odds resulting from a unit change in the predictor.
OR > 1: Predictor , Probability of outcome occurring .
OR < 1: Predictor , Probability of outcome occurring .
predictorthe in change unit a beforeOdds predictorthe in change unit a afterOdds bExp )( predictorthe in change unit a beforeOdds
predictorthe in change unit a afterOdds bExp )(
Slide 8
![Page 9: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/9.jpg)
Assessing the model
To calculate the change in odds that results from a unit change in the predictor, we must first calculate the odds of becoming pregnant given that a condom wasn’t used using these equations. We then calculate the odds of becoming pregnant given that a condom was used. Finally, we calculate the proportionate change in these two odds.
Slide 9
![Page 10: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/10.jpg)
Model Assessment
![Page 11: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/11.jpg)
![Page 12: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/12.jpg)
![Page 13: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/13.jpg)
Odds Ratio
![Page 14: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/14.jpg)
Methods of Regression Forced Entry: All variables entered
simultaneously. Hierarchical: Variables entered in blocks.
Blocks should be based on past research, or theory being tested. Good Method.
Stepwise: Variables entered on the basis of statistical criteria (i.e. relative contribution to predicting outcome).
Should be used only for exploratory analysis.
Slide 14
![Page 15: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/15.jpg)
Things That Can go Wrong
Linearity Independence of Errors Multicollinearity Overdispersion
![Page 16: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/16.jpg)
Output: Initial Model
![Page 17: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/17.jpg)
Output: Initial Model
![Page 18: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/18.jpg)
Output: Initial Model
![Page 19: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/19.jpg)
Output: Initial Model
![Page 20: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/20.jpg)
Output: Initial Model
Dependent Variable Encoding
Original Value Internal Value
recalled 0
not recalled 1
Categorical Variables Codings
Frequency
Parameter
coding
(1)
gender female 9 1.000
male 14 .000
![Page 21: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/21.jpg)
![Page 22: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/22.jpg)
![Page 23: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/23.jpg)
![Page 24: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/24.jpg)
![Page 25: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/25.jpg)
![Page 26: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/26.jpg)
Output: Initial Model
![Page 27: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/27.jpg)
Output: Initial Model
![Page 28: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/28.jpg)
Output: Initial Model
![Page 29: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/29.jpg)
Output: Step 1
![Page 30: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/30.jpg)
Output: Step 1
![Page 31: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/31.jpg)
Output: Step 1
![Page 32: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/32.jpg)
Classification Plot
![Page 33: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/33.jpg)
Summary
The overall fit of the final model is shown by the −2 log-likelihood statistic.
If the significance of the chi-square statistic is less than .05, then the model is a significant fit of the data.
Check the table labelled Variables in the equation to see which variables significantly predict the outcome.
Use the odds ratio, Exp(B), for interpretation. OR > 1, then as the predictor increases, the odds of the
outcome occurring increase. OR < 1, then as the predictor increases, the odds of the
outcome occurring decrease. The confidence interval of the OR should not cross 1!
Check the table labelled Variables not in the equation to see which variables did not significantly predict the outcome.
![Page 34: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/34.jpg)
Reporting the Analysis
![Page 35: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/35.jpg)
Multinomial logistic regression
• Logistic regression to predict membership of more than two categories.
• It (basically) works in the same way as binary logistic regression.
• The analysis breaks the outcome variable down into a series of comparisons between two categories.
– E.g., if you have three outcome categories (A, B and C), then the analysis will consist of two comparisons that you choose:
• Compare everything against your first category (e.g. A vs. B and A vs. C),
• Or your last category (e.g. A vs. C and B vs. C),• Or a custom category (e.g. B vs. A and B vs. C).
• The important parts of the analysis and output are much the same as we have just seen for binary logistic regression
![Page 36: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/36.jpg)
I may not be Fred Flintstone …
• How successful are chat-up lines?• The chat-up lines used by 348 men and 672 women in a
night-club were recorded.• Outcome:
– Whether the chat-up line resulted in one of the following three events:
• The person got no response or the recipient walked away,• The person obtained the recipient’s phone number,• The person left the night-club with the recipient.
• Predictors:– The content of the chat-up lines were rated for:
• Funniness (0 = not funny at all, 10 = the funniest thing that I have ever heard)
• Sexuality (0 = no sexual content at all, 10 = very sexually direct)• Moral vales (0 = the chat-up line does not reflect good characteristics,
10 = the chat-up line is very indicative of good characteristics).– Gender of recipient
![Page 37: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/37.jpg)
Output
![Page 38: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/38.jpg)
Output
![Page 39: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/39.jpg)
Output
![Page 40: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/40.jpg)
Output
![Page 41: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/41.jpg)
Interpretation• Good_Mate: Whether the chat-up line showed signs of good moral fibre
significantly predicted whether you got a phone number or no response/walked away, b = 0.13, Wald χ2(1) = 6.02, p < .05.
• Funny: Whether the chat-up line was funny did not significantly predict whether you got a phone number or no response, b = 0.14, Wald χ2(1) = 1.60, p > .05.
• Gender: The gender of the person being chatted up significantly predicted whether they gave out their phone number or gave no response, b = −1.65, Wald χ2(1) = 4.27, p < .05.
• Sex: The sexual content of the chat-up line significantly predicted whether you got a phone number or no response/walked away, b = 0.28, Wald χ2(1) = 9.59, p < .01.
• Funny×Gender: The success of funny chat-up lines depended on whether they were delivered to a man or a woman because in interaction these variables predicted whether or not you got a phone number, b = 0.49, Wald χ2(1) = 12.37, p < .001.
• Sex×Gender: The success of chat-up lines with sexual content depended on whether they were delivered to a man or a woman because in interaction these variables predicted whether or not you got a phone number, b = −0.35, Wald χ2(1) = 10.82, p < .01.
![Page 42: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/42.jpg)
Interpretation• Good_Mate: Whether the chat-up line showed signs of good moral fibre
did not significantly predict whether you went home with the date or got a slap in the face, b = 0.13, Wald χ2(1) = 2.42, p > .05.
• Funny: Whether the chat-up line was funny significantly predicted whether you went home with the date or no response, b = 0.32, Wald χ2(1) = 6.46, p < .05.
• Gender: The gender of the person being chatted up significantly predicted whether they went home with the person or gave no response, b = −5.63, Wald χ2(1) = 17.93, p < .001.
• Sex: The sexual content of the chat-up line significantly predicted whether you went home with the date or got a slap in the face, b = 0.42, Wald χ2(1) = 11.68, p < .01.
• Funny×Gender: The success of funny chat-up lines depended on whether they were delivered to a man or a woman because in interaction these variables predicted whether or not you went home with the date, b = 1.17, Wald χ2(1) = 34.63, p < .001.
• Sex×Gender: The success of chat-up lines with sexual content depended on whether they were delivered to a man or a woman because in interaction these variables predicted whether or not you went home with the date, b = −0.48, Wald χ2(1) = 8.51, p < .01.
![Page 43: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/43.jpg)
Reporting the Results
![Page 44: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/44.jpg)
Multiple Logistic Regression
E(Y|X)=P(Y=1|x) = Π(X) =
The relationship between πi and X is S shapedThe logit (log-odds) transformation (link function)
Has many of the desirable properties of the linear regression model, while relaxing some of the assumptions.
Maximum Likelihood (ML) model parameters are estimated by iteration
44
pp
pp
XX
XX
e
e
110
110
1
ppXxx
xxg
10)(1
)(ln)(
![Page 45: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/45.jpg)
Assumptions for Logistic Regression• The independent variables are liner in the logit. It is also possible
to add explicit interaction and power terms, as in OLS regression. • The dependent variable need not be normally distributed (it is
assumed to be distributed within the range of the exponential family of distributions, such as normal, Poisson, binomial, gamma).
• The dependent variable need not be homoscedastic for each level of the independents; that is, there is no homogeneity of variance assumption.
• Normally distributed error terms are not assumed. • The independent variables may be binary, categorical,
continuous
45
![Page 46: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/46.jpg)
Applications
Identify risk factors
Ho: β0 = 0
while controlling for confounders and other important determinants of the event
Classification: Predict outcome for a new observation with a particular constellation of risk factors (a form of discriminant analysis)
46
![Page 47: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/47.jpg)
Design Variables (coding)
In SPSS, designate Categorical to get k-1 indicators for a k-level factor
design variable
D1 D2
RACE
White 0 0
Black 1 0
Other 0 1
47
![Page 48: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/48.jpg)
Interpretation of the parameters
If p is the probability of an event and O is the odds for that event then
… the link function in logistic regression gives the log-odds
48
eventnoofyprobabilit
eventofyprobabilit
p
pO
1
ppXxx
xxg
10)(1
)(ln)(
![Page 49: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/49.jpg)
…and the odds ratio, OR, is
49
Y=1 Y=0
X=1
X=0
10
10
1)1(
e
e
0
0
1)0(
e
e
101
1)1(1
e
01
1)0(1
e
1
)]1(1)[0(
)]0(1)[1(
egebraaltediousOR
![Page 50: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/50.jpg)
Definitions and Annotated SPSS output for Logistic Regression
http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#assumpt
Virtually any sin that can be committed with least squares regression can be committed with logistic regression. These include stepwise procedures and arriving at a final model by looking at the data. All of the warnings and recommendations made for least squares regression apply to logistic regression as well ...
Gerard Dallal
50
![Page 51: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/51.jpg)
51
•Assessing the Model Fit
There are several R2-like measures; they are not goodness-of-fit tests but rather attempt to measure strength of association
Cox and Snell's R-Square is an attempt to imitate the interpretation of multiple R-Square based on the likelihood, but its maximum can be (and usually is) less than 1.0, making it difficult to interpret. It is part of SPSS output.
Nagelkerke's R-Square is a further modification of the Cox and Snell coefficient to assure that it can vary from 0 to 1. That is, Nagelkerke's R2 divides Cox and Snell's R2 by its maximum in order to achieve a measure that ranges from 0 to 1. Therefore Nagelkerke's R-Square will normally be higher than the Cox and Snell measure. It is part of SPSS output and is the most-reported of the R-squared estimates. See Nagelkerke (1991).
![Page 52: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/52.jpg)
Hosmer and Lemeshow's Goodness of Fit Testtests the null hypothesis that the data were generated by the fitted
model
1. divide subjects into deciles based on predicted probabilities
2. compute a chi-square from observed and expected frequencies
3. compute a probability (p) value from the chi-square distribution with 8 degrees of freedom to test the fit of the logistic model
If the Hosmer and Lemeshow Goodness-of-Fit test statistic has p = .05 or less, we reject the null hypothesis that there is no difference between the observed and model-predicted values of the dependent. (This means the model predicts values significantly different from the observed values).
52
![Page 53: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/53.jpg)
53
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20
observed
expected
Observed vs. PredictedThis particular model performs better
when the event rate is low
![Page 54: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/54.jpg)
•Check for Linearity in the LOGIT
Box-Tidwell Transformation (Test): Add to the logistic model interaction terms which are the crossproduct of each independent times its natural logarithm [(X)ln(X)]. If these terms are significant, then there is nonlinearity in the logit. This method is not sensitive to small nonlinearities.
Orthogonal polynomial contrasts, an option in SPSS, may be used. This option treats each independent as a categorical variable and computes logit (effect) coefficients for each category, testing for linear, quadratic, cubic, or higher-order effects. The logit should not change over the contrasts. This method is not appropriate when the independent has a large number of values, inflating the standard errors of the contrasts.
54
![Page 55: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/55.jpg)
Residual Plots
Plot the Cook’s distance against
Several other plots suggested in Hosmer & Lemishow (p177) involve further manipulation of the statistics produced by SPSS
External Validationa new samplea hold-out sample
Cross Validation (classification)n-fold (leave 1 out)V-fold (divide data into V subsets)
55
j
![Page 56: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/56.jpg)
Pitfalls1. Multiple comparisons (data driven model/data dredging)2. Over fitting
-complex models fit to a small datasetgood fit in THIS dataset, but not generalize: you’re modeling the
random error at least 10 events per independent variable-validation
new data to check predictive ability, calibrationhold-out sample
-look for sensitivity to a single observation (residuals)3. Violating the assumptions
more serious in prediction models than association4. There are many strategies: don’t try them all
-chose one based on the structure of the question-draw primary conclusions based on that one-examine robustness to other strategies
56
![Page 57: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/57.jpg)
CASE STUDY
1. Develop a strategy for analyzing Hosmer & Lemishow’s Low Birth weight data using LOW as the dependent variable
2. Try ANCOVA for the same data with BWT (birth weight in grams) as the dependent variable
LBW.SAV is on the S drive under GCRC data analysis
57
![Page 58: Logistic Regression. Aims When and Why do we Use Logistic Regression? – Binary – Multinomial Theory Behind Logistic Regression – Assessing the Model –](https://reader031.fdocuments.in/reader031/viewer/2022022215/56649de85503460f94ae2008/html5/thumbnails/58.jpg)
References
Hosmer, D.W. and Lemishow, S, (2000) Applied Logistic Regression, 2nd ed., John Wiley & Sons, New York, NY
Harrell, F. E., Lee, K. L., Mark, D. B. (1996) “Multivariable Prognostic models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors”, Statistics in Medicine, 15, 361-387
Nagelkerke, N. J. D. (1991). “A note on a general definition of the coefficient of determination” Biometrika, Vol. 78, No. 3: 691-692. Covers the two measures of R-square for logistic regression which are found in SPSS output.
Agresti, A. (1990) Categorical Data Analysis, John Wiley & Sons, New York, NY
58