AMMBR III
-
Upload
thornton-cedric -
Category
Documents
-
view
29 -
download
3
description
Transcript of AMMBR III
DESCRIPTION OF DATA
Variable Description Codes/ValuesName1 Identification Code ID Number ID2 Birth Number 1-4 BIRTH3 Smoking Status 0 = No, 1 = YesSMOKE
During Pregnancy4 Race 1 = White, 2 = Black RACE
3 = Other5 Age of Mother Years AGE6 Weight of Mother at Pounds LWT
Last Menstrual Period
7 Birth Weight Grams BWT8 Low Birth Weight 1 = BWT <=2500g, LOW
0 = BWT >2500g
SUMMARY OF THE DATA
lowweight 488 .3094262 .4627315 0 1 birthweight 488 2841.971 688.3148 798 5025weightmother 488 142.75 32.43726 80 272 agemother 488 26.44057 5.825363 14 48 race 488 1.852459 .9123576 1 3 smoking 488 .3995902 .4903167 0 1 birth 488 1.872951 .8283019 1 4 id 488 93.56148 53.91331 1 188 Variable Obs Mean Std. Dev. Min Max
. summ
LOGISTICS OF LOGISTIC REGRESSION
Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions
EMPTY MODEL
_cons -.8028031 .0979279 -8.20 0.000 -.9947383 -.6108679 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -301.89672 Pseudo R2 = -0.0000 Prob > chi2 = . LR chi2(0) = -0.00Logistic regression Number of obs = 488
Iteration 1: log likelihood = -301.89672 Iteration 0: log likelihood = -301.89672
. logit lowweight
31.1
1)|Pr(
)80.(
e
XY
CLASSIFICATION TABLE EMPTY MODEL
Correctly classified 69.06% False - rate for classified - Pr( D| -) 30.94%False + rate for classified + Pr(~D| +) .%False - rate for true D Pr( -| D) 100.00%False + rate for true ~D Pr( +|~D) 0.00% Negative predictive value Pr(~D| -) 69.06%Positive predictive value Pr( D| +) .%Specificity Pr( -|~D) 100.00%Sensitivity Pr( +| D) 0.00% True D defined as lowweight != 0Classified + if predicted Pr(D) >= .5
Total 151 337 488 - 151 337 488 + 0 0 0 Classified D ~D Total True
Logistic model for lowweight
. estat class
FULL MODEL
_cons -1.139015 .5844386 -1.95 0.051 -2.284493 .0064639weightmother -.0086232 .0035144 -2.45 0.014 -.0155113 -.0017351 agemother .0452296 .0184831 2.45 0.014 .0090033 .0814558 smoking .8097503 .2022273 4.00 0.000 .413392 1.206109 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488
Iteration 4: log likelihood = -288.76218 Iteration 3: log likelihood = -288.76218 Iteration 2: log likelihood = -288.76222 Iteration 1: log likelihood = -288.88873 Iteration 0: log likelihood = -301.89672
. logit lowweight smoking agemother weightmother
LOGISTICS OF LOGISTIC REGRESSION
Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions
MODEL FIT: LIKELIHOOD RATIO TEST
_cons -1.139015 .5844386 -1.95 0.051 -2.284493 .0064639weightmother -.0086232 .0035144 -2.45 0.014 -.0155113 -.0017351 agemother .0452296 .0184831 2.45 0.014 .0090033 .0814558 smoking .8097503 .2022273 4.00 0.000 .413392 1.206109 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488
Iteration 4: log likelihood = -288.76218 Iteration 3: log likelihood = -288.76218 Iteration 2: log likelihood = -288.76222 Iteration 1: log likelihood = -288.88873 Iteration 0: log likelihood = -301.89672
. logit lowweight smoking agemother weightmother
CLASSIFICATION TABLE FULL MODEL
Correctly classified 69.47% False - rate for classified - Pr( D| -) 29.57%False + rate for classified + Pr(~D| +) 46.43%False - rate for true D Pr( -| D) 90.07%False + rate for true ~D Pr( +|~D) 3.86% Negative predictive value Pr(~D| -) 70.43%Positive predictive value Pr( D| +) 53.57%Specificity Pr( -|~D) 96.14%Sensitivity Pr( +| D) 9.93% True D defined as lowweight != 0Classified + if predicted Pr(D) >= .5
Total 151 337 488 - 136 324 460 + 15 13 28 Classified D ~D Total True
Logistic model for lowweight
. estat class
HOSMER & LEMESHOW TEST
Prob > chi2 = 0.2559 Hosmer-Lemeshow chi2(8) = 10.13 number of groups = 10 number of observations = 488
10 0.5951 21 24.6 27 23.4 48 9 0.4745 25 22.0 24 27.0 49 8 0.4160 22 19.1 27 29.9 49 7 0.3659 17 16.6 32 32.4 49 6 0.3161 12 14.5 37 34.5 49 5 0.2826 13 13.0 35 35.0 48 4 0.2597 6 12.2 43 36.8 49 3 0.2391 16 11.2 33 37.8 49 2 0.2190 11 10.1 38 38.9 49 1 0.1929 8 7.8 41 41.2 49 Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total (Table collapsed on quantiles of estimated probabilities)
Logistic model for lowweight, goodness-of-fit test
. estat gof, group(10) table
LOGISTICS OF LOGISTIC REGRESSION
Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions
SIGNIFICANCE AND DIRECTION
_cons -1.139015 .5844386 -1.95 0.051 -2.284493 .0064639weightmother -.0086232 .0035144 -2.45 0.014 -.0155113 -.0017351 agemother .0452296 .0184831 2.45 0.014 .0090033 .0814558 smoking .8097503 .2022273 4.00 0.000 .413392 1.206109 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488
Iteration 4: log likelihood = -288.76218 Iteration 3: log likelihood = -288.76218 Iteration 2: log likelihood = -288.76222 Iteration 1: log likelihood = -288.88873 Iteration 0: log likelihood = -301.89672
. logit lowweight smoking agemother weightmother
.
MAGNITUDE
weightmother .9914139 .0034842 -2.45 0.014 .9846084 .9982664 agemother 1.046268 .0193383 2.45 0.014 1.009044 1.084865 smoking 2.247347 .4544749 4.00 0.000 1.511938 3.34046 lowweight Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488
. logistic lowweight smoking agemother weightmother
(Exponentiated coefficienti - 1.0) * 100 = 125 -> a smoker has 125% higher odds of have a lowweight baby.
EXAMINING RESIDUALS IN LR
1. Isolate points for which the model fits poorly
2. Isolate influential data points
SAMANTHAS TIPS
In stata after estimation of the model the predict command can be used to calculate residuals etc.
Type help logit postestimation for details
STANDARDIZED RESIDUAL
355. 135 1 0 2 18 229 1858 1 3.181052 id birth smoking race agemot~r weight~r birthw~t lowwei~t ZRE
. list if ZRE > 3
Total 1 100.00 3.181052 1 100.00 100.00 residual Freq. Percent Cum. d Pearson standardize
. tab ZRE if ZRE > 3
.
MULTI-COLLINEARITY
Field recommends obtaining VIF by using a OLS regression to estimate the same model
Checking the correlation matrix of the independent variables is often enough.
If you find high correlations (say >.6), then check VIFs