AMMBR III

29
AMMBR III Gerrit Rooks 22-02-10

description

Gerrit Rooks 22-02-10. AMMBR III. Description of data. VariableDescriptionCodes/ValuesName 1Identification Code ID NumberID 2Birth Number1-4BIRTH 3Smoking Status 0 = No, 1 = YesSMOKE During Pregnancy 4Race1 = White, 2 = Black RACE 3 = Other - PowerPoint PPT Presentation

Transcript of AMMBR III

AMMBR III

Gerrit Rooks22-02-10

DESCRIPTION OF DATA

Variable Description Codes/ValuesName1 Identification Code ID Number ID2 Birth Number 1-4 BIRTH3 Smoking Status 0 = No, 1 = YesSMOKE

During Pregnancy4 Race 1 = White, 2 = Black RACE

3 = Other5 Age of Mother Years AGE6 Weight of Mother at Pounds LWT

Last Menstrual Period

7 Birth Weight Grams BWT8 Low Birth Weight 1 = BWT <=2500g, LOW

0 = BWT >2500g

SUMMARY OF THE DATA

lowweight 488 .3094262 .4627315 0 1 birthweight 488 2841.971 688.3148 798 5025weightmother 488 142.75 32.43726 80 272 agemother 488 26.44057 5.825363 14 48 race 488 1.852459 .9123576 1 3 smoking 488 .3995902 .4903167 0 1 birth 488 1.872951 .8283019 1 4 id 488 93.56148 53.91331 1 188 Variable Obs Mean Std. Dev. Min Max

. summ

LOGISTICS OF LOGISTIC REGRESSION

Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

EMPTY MODEL

_cons -.8028031 .0979279 -8.20 0.000 -.9947383 -.6108679 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -301.89672 Pseudo R2 = -0.0000 Prob > chi2 = . LR chi2(0) = -0.00Logistic regression Number of obs = 488

Iteration 1: log likelihood = -301.89672 Iteration 0: log likelihood = -301.89672

. logit lowweight

31.1

1)|Pr(

)80.(

e

XY

CLASSIFICATION TABLE EMPTY MODEL

Correctly classified 69.06% False - rate for classified - Pr( D| -) 30.94%False + rate for classified + Pr(~D| +) .%False - rate for true D Pr( -| D) 100.00%False + rate for true ~D Pr( +|~D) 0.00% Negative predictive value Pr(~D| -) 69.06%Positive predictive value Pr( D| +) .%Specificity Pr( -|~D) 100.00%Sensitivity Pr( +| D) 0.00% True D defined as lowweight != 0Classified + if predicted Pr(D) >= .5

Total 151 337 488 - 151 337 488 + 0 0 0 Classified D ~D Total True

Logistic model for lowweight

. estat class

FULL MODEL

_cons -1.139015 .5844386 -1.95 0.051 -2.284493 .0064639weightmother -.0086232 .0035144 -2.45 0.014 -.0155113 -.0017351 agemother .0452296 .0184831 2.45 0.014 .0090033 .0814558 smoking .8097503 .2022273 4.00 0.000 .413392 1.206109 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488

Iteration 4: log likelihood = -288.76218 Iteration 3: log likelihood = -288.76218 Iteration 2: log likelihood = -288.76222 Iteration 1: log likelihood = -288.88873 Iteration 0: log likelihood = -301.89672

. logit lowweight smoking agemother weightmother

LOGISTICS OF LOGISTIC REGRESSION

Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

MODEL FIT: LIKELIHOOD RATIO TEST

_cons -1.139015 .5844386 -1.95 0.051 -2.284493 .0064639weightmother -.0086232 .0035144 -2.45 0.014 -.0155113 -.0017351 agemother .0452296 .0184831 2.45 0.014 .0090033 .0814558 smoking .8097503 .2022273 4.00 0.000 .413392 1.206109 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488

Iteration 4: log likelihood = -288.76218 Iteration 3: log likelihood = -288.76218 Iteration 2: log likelihood = -288.76222 Iteration 1: log likelihood = -288.88873 Iteration 0: log likelihood = -301.89672

. logit lowweight smoking agemother weightmother

CLASSIFICATION TABLE FULL MODEL

Correctly classified 69.47% False - rate for classified - Pr( D| -) 29.57%False + rate for classified + Pr(~D| +) 46.43%False - rate for true D Pr( -| D) 90.07%False + rate for true ~D Pr( +|~D) 3.86% Negative predictive value Pr(~D| -) 70.43%Positive predictive value Pr( D| +) 53.57%Specificity Pr( -|~D) 96.14%Sensitivity Pr( +| D) 9.93% True D defined as lowweight != 0Classified + if predicted Pr(D) >= .5

Total 151 337 488 - 136 324 460 + 15 13 28 Classified D ~D Total True

Logistic model for lowweight

. estat class

HOSMER & LEMESHOW TEST

Prob > chi2 = 0.2559 Hosmer-Lemeshow chi2(8) = 10.13 number of groups = 10 number of observations = 488

10 0.5951 21 24.6 27 23.4 48 9 0.4745 25 22.0 24 27.0 49 8 0.4160 22 19.1 27 29.9 49 7 0.3659 17 16.6 32 32.4 49 6 0.3161 12 14.5 37 34.5 49 5 0.2826 13 13.0 35 35.0 48 4 0.2597 6 12.2 43 36.8 49 3 0.2391 16 11.2 33 37.8 49 2 0.2190 11 10.1 38 38.9 49 1 0.1929 8 7.8 41 41.2 49 Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total (Table collapsed on quantiles of estimated probabilities)

Logistic model for lowweight, goodness-of-fit test

. estat gof, group(10) table

LOGISTICS OF LOGISTIC REGRESSION

Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

SIGNIFICANCE AND DIRECTION

_cons -1.139015 .5844386 -1.95 0.051 -2.284493 .0064639weightmother -.0086232 .0035144 -2.45 0.014 -.0155113 -.0017351 agemother .0452296 .0184831 2.45 0.014 .0090033 .0814558 smoking .8097503 .2022273 4.00 0.000 .413392 1.206109 lowweight Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488

Iteration 4: log likelihood = -288.76218 Iteration 3: log likelihood = -288.76218 Iteration 2: log likelihood = -288.76222 Iteration 1: log likelihood = -288.88873 Iteration 0: log likelihood = -301.89672

. logit lowweight smoking agemother weightmother

.

MAGNITUDE

weightmother .9914139 .0034842 -2.45 0.014 .9846084 .9982664 agemother 1.046268 .0193383 2.45 0.014 1.009044 1.084865 smoking 2.247347 .4544749 4.00 0.000 1.511938 3.34046 lowweight Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -288.76218 Pseudo R2 = 0.0435 Prob > chi2 = 0.0000 LR chi2(3) = 26.27Logistic regression Number of obs = 488

. logistic lowweight smoking agemother weightmother

(Exponentiated coefficienti - 1.0) * 100 = 125 -> a smoker has 125% higher odds of have a lowweight baby.

EXAMINING RESIDUALS IN LR

1. Isolate points for which the model fits poorly

2. Isolate influential data points

RESIDUAL STATISTICS

SAMANTHAS TIPS

In stata after estimation of the model the predict command can be used to calculate residuals etc.

Type help logit postestimation for details

PREDICTED PROBABILITIES0

12

34

5D

ens

ity

.1 .2 .3 .4 .5 .6Pr(lowweight)

0.5

11.

5D

ensi

ty

-1 0 1 2 3

standardized Pearson residual

HISTOGRAM OF STANDARDIZED RESIDUALS

STANDARDIZED RESIDUAL

355. 135 1 0 2 18 229 1858 1 3.181052 id birth smoking race agemot~r weight~r birthw~t lowwei~t ZRE

. list if ZRE > 3

Total 1 100.00 3.181052 1 100.00 100.00 residual Freq. Percent Cum. d Pearson standardize

. tab ZRE if ZRE > 3

.

INDEX PLOT ST. RESIDUALS-1

01

23

stan

dard

ize

d P

ears

on r

esi

dua

l

0 50 100 150 200id

COOKS DISTANCE

. predict cook, dbeta

INDEX PLOT COOKS DISTANCE0

.05

.1.1

5P

reg

ibon

's d

beta

0 50 100 150 200id

MULTI-COLLINEARITY

Field recommends obtaining VIF by using a OLS regression to estimate the same model

Checking the correlation matrix of the independent variables is often enough.

If you find high correlations (say >.6), then check VIFs

FINALLY 2 CAUSES FOR TROUBLE

Incomplete information

Complete seperation

INCOMPLETE INFORMATION

COMPLETE SEPARATION

COMPLETE SEPARATION

PRACTICAL

Open ammbr.dta Analyse entrepreneurship