1 What you've always wanted to know about logistic regression analysis, but were afraid to ask......
-
Upload
grace-durant -
Category
Documents
-
view
215 -
download
0
Transcript of 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask......
![Page 1: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/1.jpg)
1
What you've always wanted to know about logistic regression analysis, but were afraid to
ask...
Februari, 1 2010
Gerrit RooksSociology of Innovation
Innovation Sciences & Industrial Engineering Phone: 5509
email: [email protected]
![Page 2: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/2.jpg)
This Lecture
• Why logistic regression analysis?• The logistic regression model• Estimation• Goodness of fit• An example
2
![Page 3: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/3.jpg)
3
What's the difference between 'normal' regression and logistic regression?
Regression analysis: – Relate one
or more independent (predictor) variables to a dependent (outcome) variable
![Page 4: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/4.jpg)
4
What's the difference between 'normal' regression and logistic regression?
• Often you will be confronted with outcome variables that are dichotomic:– success vs failure– employed vs unemployed– promoted or not– sick or healthy – pass or fail an exam
![Page 5: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/5.jpg)
5
ExampleRelationship between hours studied for exam and success
Hours # Failed exam
# Passed exam?
Total # students
Prob. pass exam
28 4 2 6 .33
29 3 2 5 .40
30 2 7 9 .78
31 2 7 9 .78
32 4 16 20 .80
33 1 14 15 .93
![Page 6: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/6.jpg)
6
Linear regression analysisWhy is this wrong?
![Page 7: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/7.jpg)
7
Logistic RegressionThe better alternative
![Page 8: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/8.jpg)
8
![Page 9: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/9.jpg)
9
The logistic regression equationpredicting probabilities
)( 11101
1)( Xbbe
YP
predictedprobability(always between0 and 1)
similar to regressionanalysis
![Page 10: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/10.jpg)
10
The Logistic functionSometimes authors rearrange the model
)(
)(
)( 1110
1110
1110 11
1)(
Xbb
Xbb
Xbb e
e
eYP
nn xcxcxccyp
yp
...)1(1
)1(ln 22110
or also
![Page 11: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/11.jpg)
11
How do we estimate coefficients?Maximum-likelihood estimation
• Parameters are estimated by `fitting' models, based on the available predictors, to the observed data
• The chosen model fits the data best, i.e. is closest to the data
• Fit is determined by the so-called log likelihood statistic
![Page 12: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/12.jpg)
12
Maximum likelihood estimationThe log-likelihood statistic
N
iiiii YPYYPYLL
1
)]}(1ln[)1())(ln({
Large values of LL indicate poor fit of the model
HOWEVER, THIS STATISTIC CANNOT BE USED TO EVALUATE THE FIT OF A SINGLE MODEL
![Page 13: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/13.jpg)
13
Quantity of Study Hours Outcome
3 0
34 1
17 0
6 0
12 0
15 1
26 1
29 1
An example to illustrate maximum likelihood and the log likelihood statistic
Suppose we know hours spentstudying and the outcome of an exam
![Page 14: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/14.jpg)
14
)05.0( 11
1)(P
XeY
Quantity of Study Hours Outcome
Predicted probability (b0=0; b1 = 0.05)
Predicted probability(b0=-6.44; b1 = 0.39)
3 0 .53 .01
34 1 .85 .99
17 0 .71 .53
6 0 .57 .02
12 0 .65 .14
15 1 .68 .34
26 1 .79 .97
29 1 .81 .99
)39.044.6( 11
1)(P X
eY
In ML different valuesfor the parameters are `tried'
Lets look at two possibilities: 1; b0 = 0 & b1= 0.05; 2, b0 = 0 & b1= 0.05
![Page 15: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/15.jpg)
15
Quantity of Study Hours Outcome
Predicted probability (b0=0; b1 = 0.05)
LL (b0=0; b1 = 0.05)
3 0 .53 -.75
34 1 .85 -.16
17 0 .71 -1.24
6 0 .57 -.84
12 0 .65 -1.05
15 1 .68 -.39
26 1 .79 -.24
29 1 .81 -.21
N
iiiii YPYYPYLL
1
)]}(1ln[)1())(ln({
We are now able to calculate the log likelihood statistic
![Page 16: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/16.jpg)
16
Outcome
Pr(b0=0;
b1 = 0.05)
LL (b0=0; b1 =
0.05)
Pr(b0=-6.44; b1 = 0.39)
LL(b0=-6.44; b1 =
0.39)
0 .53 -.75 .01 -.01
1 .85 -.16 .99 -.01
0 .71 -1.24 .53 -.75
0 .57 -.84 .02 -.02
0 .65 -1.05 .14 -.15
1 .68 -.39 .34 -1.08
1 .79 -.24 .97 -.03
1 .81 -.21 .99 -.01
∑ -4.88 -2.07
Two models and their log likelihood statistic
Based on a clever algorithm the model with the best fit (LL closest to 0) is chosen
![Page 17: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/17.jpg)
17
After estimationHow do I determine significance?
• Obviously SPSS does all the work for you
• How to interpret output of SPSS
• Two major issues1. Overall model fit
– Between model comparisons
– Pseudo R-square– Predictive accuracy /
classification test
2. Coefficients– Wald test– Likelihood ratio test– Odds ratios
)*39,044,6(1
1)(P
studyhourseY
![Page 18: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/18.jpg)
18
Model fit: Between model comparison
)]baseline()New([22 LLLL
The log-likelihood ratio test statistic can be used to test the fit of a model
The test statistic has achi-square distribution
Model fit reduced modelModel fit full model
![Page 19: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/19.jpg)
19
Model fit
)( 1101
1)(P Xbbe
Y
)]baseline()New([22 LLLL
The log-likelihood ratio test statistic can be used to test the fit of a model
Model fit reduced modelModel fit full model
)( 01
1)(P be
Y
![Page 20: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/20.jpg)
Between model comparison
• Estimate a null model• Baseline model
• Estimate an improved model• This model contains more
variables• Assess the difference in -
2LL between the models• This difference follows a
chi-square distribution• degrees of freedom = #
estimated parameters in proposed model – # estimated parameters in null model2020
)( 221101
1)(P XbXbbe
Y
)]baseline()New([22 LLLL
Model fit reduced model
Model fit full model
)( 1101
1)(P Xbbe
Y
![Page 21: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/21.jpg)
21
Overall model fitR and R2
2
22
)(
)ˆ(
yy
yyR
i
i
R2 in multiple regression is a measure of the variance explained by the model
SS due to regression
Total SS
![Page 22: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/22.jpg)
22
Overall model fitpseudo R2
Just like in multiple regression, logit R2 ranges 0.0 to 1.0
– Cox and Snell• cannot theoretically
reach 1
– Nagelkerke• adjusted so that it can
reach 1
)(2
)(2LOGIT
2
OriginalLL
ModelLLR
log-likelihood of modelbefore any predictors wereentered
log-likelihood of the modelthat you want to test
NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression
![Page 23: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/23.jpg)
23
What is a small or large R and R2?Strength of correlation
Small 0.10 to 0.29
Medium 0.30 to 0.49
Large 0.50 to 1.00
![Page 24: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/24.jpg)
24
Overall model fitClassification table
Classification Tablea
30 5 85,7
7 33 82,5
84,0
ObservedMissed Penalty
Scored Penalty
Result of PenaltyKick
Overall Percentage
Step 1
MissedPenalty
ScoredPenalty
Result of Penalty Kick
PercentageCorrect
Predicted
The cut value is ,500a.
How well does the model predict outcomes?
This means that we assume that if our model predictsthat a player will score with a probability of .51 (above .5)the prediction will be a score (lower than .50 is a miss).
spss output
![Page 25: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/25.jpg)
25
Testing significance of coefficientsThe Wald statistic: not really good
• In linear regression analysis this statistic is used to test significance
• In logistic regression something similar exists
• however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely)
b
b
SEWald
t-distribution standard error of estimate
estimate
![Page 26: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/26.jpg)
26
Likelihood ratio testan alternative way to test significance of a coefficient
)( 1101
1)(P Xbbe
Y
)]Without()With([22 LLLL
To avoid type II errors for some variables you best use the Likelihood ratio test
model with variable model without variable
)( 01
1)(P be
Y
![Page 27: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/27.jpg)
27
Before we go to the exampleA recap
• Logistic regression– dichotomous outcome– logistic function– log-likelihood / maximum likelihood
• Model fit– likelihood ratio test (compare LL of models)– Pseudo R-square– Classification table– Wald test
![Page 28: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/28.jpg)
28
Illustration with SPSS
• Penalty kicks data, variables:– Scored: outcome variable,
• 0 = penalty missed, and 1 = penalty scored
– Pswq: degree to which a player worries– Previous: percentage of penalties scored by a
particulare player in their career
![Page 29: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/29.jpg)
29
Case Processing Summary
75 100,0
0 ,0
75 100,0
0 ,0
75 100,0
Unweighted Casesa
Included in Analysis
Missing Cases
Total
Selected Cases
Unselected Cases
Total
N Percent
If weight is in effect, see classification table for the totalnumber of cases.
a.
Dependent Variable Encoding
0
1
Original ValueMissed Penalty
Scored Penalty
Internal Value
SPSS OUTPUT Logistic Regression
Tells you somethingabout the number of observations and missings
![Page 30: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/30.jpg)
30
Classification Tablea,b
0 35 ,0
0 40 100,0
53,3
ObservedMissed Penalty
Scored Penalty
Result of PenaltyKick
Overall Percentage
Step 0
MissedPenalty
ScoredPenalty
Result of Penalty Kick
PercentageCorrect
Predicted
Constant is included in the model.a.
The cut value is ,500b.
Variables in the Equation
,134 ,231 ,333 1 ,564 1,143ConstantStep 0B S.E. Wald df Sig. Exp(B)
Variables not in the Equation
34,109 1 ,000
34,193 1 ,000
41,558 2 ,000
previous
pswq
Variables
Overall Statistics
Step0
Score df Sig.
Block 0: Beginning Block this table is based on the empty model, i.e. onlythe constant in the model
)( 01
1)(P be
Y
these variableswill be enteredin the modellater on
![Page 31: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/31.jpg)
31
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
54,977 2 ,000
54,977 2 ,000
54,977 2 ,000
Step
Block
Model
Step 1Chi-square df Sig.
Model Summary
48,662a ,520 ,694Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 6 becauseparameter estimates changed by less than ,001.
a.
)]baseline()New([22 LLLL
Block is useful to check significance of individual coefficients, see Field
New model
this is the teststatistic
after dividing by -2
Note: Nagelkerkeis larger than Cox
![Page 32: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/32.jpg)
32
Variables in the Equation
,065 ,022 8,609 1 ,003 1,067
-,230 ,080 8,309 1 ,004 ,794
1,280 1,670 ,588 1 ,443 3,598
previous
pswq
Constant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: previous, pswq.a.
Classification Tablea
30 5 85,7
7 33 82,5
84,0
ObservedMissed Penalty
Scored Penalty
Result of PenaltyKick
Overall Percentage
Step 1
MissedPenalty
ScoredPenalty
Result of Penalty Kick
PercentageCorrect
Predicted
The cut value is ,500a.
Block 1: Method = Enter (Continued)
Predictive accuracy has improved (was 53%)
estimatesstandard errorestimates
significance based on Wald statistic
change in odds
![Page 33: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/33.jpg)
33
Variables in the Equation
,065 ,022 8,609 1 ,003 1,067
-,230 ,080 8,309 1 ,004 ,794
1,280 1,670 ,588 1 ,443 3,598
previous
pswq
Constant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: previous, pswq.a.
Classification Tablea
30 5 85,7
7 33 82,5
84,0
ObservedMissed Penalty
Scored Penalty
Result of PenaltyKick
Overall Percentage
Step 1
MissedPenalty
ScoredPenalty
Result of Penalty Kick
PercentageCorrect
Predicted
The cut value is ,500a.
How is the classification table constructed?
)*230,0*065,028,1(1
1)(P Pred.
pswqpreviouseY
oops wrong prediction
oops wrong prediction
![Page 34: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/34.jpg)
34
How is the classification table constructed?
)*230,0*065,028,1(1
1)(P Pred.
pswqpreviouseY
pswq previous scored Predict. prob.
18 56 1 .68
17 35 1 .41
20 45 0 .40
10 42 0 .85
![Page 35: 1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.](https://reader034.fdocuments.in/reader034/viewer/2022051515/55162ecc550346c6758b4c93/html5/thumbnails/35.jpg)
35
How is the classification table constructed?
pswq previous
scored Predict. prob.
predicted
18 56 1 .68 1
17 35 1 .41 0
20 45 0 .40 0
10 42 0 .85 1
Classification Tablea
30 5 85,7
7 33 82,5
84,0
ObservedMissed Penalty
Scored Penalty
Result of PenaltyKick
Overall Percentage
Step 1
MissedPenalty
ScoredPenalty
Result of Penalty Kick
PercentageCorrect
Predicted
The cut value is ,500a.