1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense...

1

BINARY CHOICE MODELS: LOGIT ANALYSIS

The linear probability model may make the nonsense predictions that an event will occur with probability greater than 1 or less than 0.

XXi

1

0

1 +2Xi

Y, p

1

A

B

1 + 2Xi

1 – 1 – 2Xi

2

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6 Z

ZeZFp

1

1)()(ZF

XZ 21

The usual way of avoiding this problem is to hypothesize that the probability is a sigmoid (S-shaped) function of Z, F(Z), where Z is a function of the explanatory variables.


3

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

Several mathematical functions are sigmoid in character. One is the logistic function shown here. As Z goes to infinity, e–Z goes to 0 and p goes to 1 (but cannot exceed 1). As Z goes to minus infinity, e–Z goes to infinity and p goes to 0 (but cannot be below 0).


XZ 21

)(ZFZe

ZFp

11)(

Z

4

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

The model implies that, for values of Z less than –2, the probability of the event occurring is low and insensitive to variations in Z. Likewise, for values greater than 2, the probability is high and insensitive to variations in Z.


XZ 21

)(ZFZe

ZFp

11)(

Z

5

To obtain an expression for the sensitivity, we differentiate F(Z) with respect to Z. The box gives the general rule for differentiating a quotient and applies it to F(Z).


VUY

2VdZdVU

dZdUV

dZdY

2

2

)1(

)1()(10)1(

Z

Z

Z

ZZ

ee

eee

dZdp

01 dZdUU

Z

Z

edZdV

eV

)1(

ZeZFp

1

1)(

6

0

0.1

0.2

-8 -6 -4 -2 0 2 4 6

2)1()( Z

Z

ee

dZdpZf


The sensitivity, as measured by the slope, is greatest when Z is 0. The marginal function, f(Z), reaches a maximum at this point.

ZeZFp

1

1)()(ZF

Z

7

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

For a nonlinear model of this kind, maximum likelihood estimation is much superior to the use of the least squares principle for estimating the parameters. More details concerning its application are given at the end of this sequence.


ZeZFp

1

1)()(ZF

XZ 21

Z

8

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

We will apply this model to the graduating from high school example described in the linear probability model sequence. We will begin by assuming that ASVABC is the only relevant explanatory variable, so Z is a simple function of it.


ZeZFp

1

1)()(ZF

ASVABCZ 21

Z

. logit GRAD ASVABC

Iteration 0: Log Likelihood =-162.29468Iteration 1: Log Likelihood =-132.97646Iteration 2: Log Likelihood =-117.99291Iteration 3: Log Likelihood =-117.36084Iteration 4: Log Likelihood =-117.35136Iteration 5: Log Likelihood =-117.35135

Logit Estimates Number of obs = 570 chi2(1) = 89.89 Prob > chi2 = 0.0000Log Likelihood = -117.35135 Pseudo R2 = 0.2769

------------------------------------------------------------------------------ grad | Coef. Std. Err. z P>|z| [95% Conf. Interval]---------+-------------------------------------------------------------------- asvabc | .1666022 .0211265 7.886 0.000 .1251951 .2080094 _cons | -5.003779 .8649213 -5.785 0.000 -6.698993 -3.308564------------------------------------------------------------------------------


9

The Stata command is logit, followed by the outcome variable and the explanatory variable(s). Maximum likelihood estimation is an iterative process, so the first part of the output will be like that shown.

. logit GRAD ASVABC

Iteration 0: log likelihood = -118.67769Iteration 1: log likelihood = -104.45292Iteration 2: log likelihood = -97.135677Iteration 3: log likelihood = -96.887294Iteration 4: log likelihood = -96.886017

Logit estimates Number of obs = 540 LR chi2(1) = 43.58 Prob > chi2 = 0.0000Log likelihood = -96.886017 Pseudo R2 = 0.1836

------------------------------------------------------------------------------ GRAD | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1313626 .022428 5.86 0.000 .0874045 .1753206 _cons | -3.240218 .9444844 -3.43 0.001 -5.091373 -1.389063------------------------------------------------------------------------------

10

In this case the coefficients of the Z function are as shown.


ASVABCZ 131.0240.3ˆ

iASVABCi ep 131.0240.31

1

11

Since there is only one explanatory variable, we can draw the probability function and marginal effect function as functions of ASVABC.


0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 60 70 80 90 100

ASVABC

Cum

ulat

ive

effe

ct

0

0.01

0.02

0.03

Mar

gina

l effe

ct



1


12

We see that ASVABC has its greatest effect on graduating when it is below 40, that is, in the lower ability range. Any individual with a score above the average (50) is almost certain to graduate.

0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 60 70 80 90 100

ASVABC

Cum

ulat

ive

effe

ct

0

0.01

0.02

0.03

Mar

gina

l effe

ct


13

The t statistic indicates that the effect of variations in ASVABC on the probability of graduating from high school is highly significant.


. logit GRAD ASVABC






14

Strictly speaking, the t statistic is valid only for large samples, so the normal distribution is the reference distribution. For this reason the statistic is denoted z in the Stata output. This z has nothing to do with our Z function.

. logit GRAD ASVABC






1


15

The coefficients of the Z function do not have any direct intuitive interpretation.


0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 60 70 80 90 100

ASVABC

Cum

ulat

ive

effe

ct

0

0.01

0.02

0.03

Mar

gina

l effe

ct

16

However, we can use them to quantify the marginal effect of a change in ASVABC on the probability of graduating. We will do this theoretically for the general case where Z is a function of several explanatory variables.


kk XXZ ...221

ZeZFp

1

1)(

17

Since p is a function of Z, and Z is a function of the X variables, the marginal effect of Xi on p can be written as the product of the marginal effect of Z on p and the marginal effect of Xi on Z.


iZ

Z

iii e

eZfXZ

dZdp

Xp 2)1(

)(

kk XXZ ...221

ZeZFp

1

1)(

18

We have already derived an expression for dp/dZ. The marginal effect of Xi on Z is given by its coefficient.


iZ

Z

iii e

eZfXZ

dZdp

Xp 2)1(

)(

2)1()( Z

Z

ee

dZdpZf

kk XXZ ...221

ZeZFp

1

1)(

19

Hence we obtain an expression for the marginal effect of Xi on p.


iZ

Z

iii e

eZfXZ

dZdp

Xp 2)1(

)(

kk XXZ ...221

ZeZFp

1

1)(

2)1()( Z

Z

ee

dZdpZf

20

kk XXZ ...221

ZeZFp

1

1)(

iZ

Z

iii e

eZfXZ

dZdp

Xp 2)1(

)(

2)1()( Z

Z

ee

dZdpZf

The marginal effect is not constant because it depends on the value of Z, which in turn depends on the values of the explanatory variables. A common procedure is to evaluate it for the sample means of the explanatory variables.


21

The sample mean of ASVABC in this sample is 51.36.


. sum GRAD ASVABC

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- GRAD | 540 .9425926 .2328351 0 1 ASVABC | 540 51.36271 9.567646 25.45931 66.07963



22

When evaluated at the mean, Z is equal to 3.507.


. sum GRAD ASVABC




507.336.51131.0240.321 XZ

23

e–Z is 0.030. Hence f(Z) is 0.028.


. sum GRAD ASVABC


507.336.51131.0240.321 XZ

030.0507.3 ee Z

028.0)030.01(

030.0)1(

)( 22

Z

Z

ee

dZdpZf

24

The marginal effect, evaluated at the mean, is therefore 0.004. This implies that a one point increase in ASVABC would increase the probability of graduating from high school by 0.4 percent.


028.0)030.01(

030.0)1(

)( 22

Z

Z

ee

dZdpZf

004.0131.0028.0)(

iii

ZfXZ

dZdp

Xp

030.0507.3 ee Z

. sum GRAD ASVABC


507.336.51131.0240.321 XZ

25

In this example, the marginal effect at the mean of ASVABC is very low. The reason is that anyone with an average score is almost certain to graduate anyway. So an increase in the score has little effect.


51.360.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 60 70 80 90 100

ASVABC

Cum

ulat

ive

effe

ct

0

0.01

0.02

0.03

Mar

gina

l effe

ct

26

To show that the marginal effect varies, we will also calculate it for ASVABC equal to 30. A one point increase in ASVABC then increases the probability by 2.9 percent.


496.0701.0 ee Z

222.0)496.01(

496.0)1(

)( 22

Z

Z

ee

dZdpZf

029.0131.0222.0)(

iii

ZfXZ

dZdp

Xp

. sum GRAD ASVABC


701.030131.0240.321 XZ

27

An individual with a score of 30 has only a 67 percent probability of graduating, and an increase in the score has a relatively large impact.


0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 60 70 80 90 100

ASVABC

Cum

ulat

ive

effe

ct

0

0.01

0.02

0.03

Mar

gina

l effe

ct

. logit GRAD ASVABC SM SF MALE

Iteration 0: log likelihood = -118.67769Iteration 1: log likelihood = -104.73493Iteration 2: log likelihood = -97.080528Iteration 3: log likelihood = -96.806623Iteration 4: log likelihood = -96.804845Iteration 5: log likelihood = -96.804844


------------------------------------------------------------------------------ GRAD | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1329127 .0245718 5.41 0.000 .0847528 .1810726 SM | -.023178 .0868122 -0.27 0.789 -.1933267 .1469708 SF | .0122663 .0718876 0.17 0.865 -.1286307 .1531634 MALE | .1279654 .3989345 0.32 0.748 -.6539318 .9098627 _cons | -3.252373 1.065524 -3.05 0.002 -5.340761 -1.163985------------------------------------------------------------------------------

28

Here is the output for a model with a somewhat better specification.


. sum GRAD ASVABC SM SF MALE

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- GRAD | 540 .9425926 .2328351 0 1 ASVABC | 540 51.36271 9.567646 25.45931 66.07963 SM | 540 11.57963 2.816456 0 20 SF | 540 11.83704 3.53715 0 20 MALE | 540 .5 .5004636 0 1

29

We will estimate the marginal effects, putting all the explanatory variables equal to their sample means.


Logit: Marginal Effects

mean b product f(Z) f(Z)b

ASVABC 51.36 0.133 6.826 0.028 0.004

SM 11.58 –0.023 –0.269 0.028 –0.001

SF 11.84 0.012 0.146 0.028 0.000

MALE 0.50 0.128 0.064 0.028 0.004

Constant 1.00 –3.252 –3.252

Total 3.514

30


The first step is to calculate Z, when the X variables are equal to their sample means.

514.3

...221

kk XXZ



ASVABC 51.36 0.133 6.826 0.028 0.004

SM 11.58 –0.023 –0.269 0.028 –0.001

SF 11.84 0.012 0.146 0.028 0.000

MALE 0.50 0.128 0.064 0.028 0.004

Constant 1.00 –3.252 –3.252

Total 3.514

31


We then calculate f(Z).

030.0514.3 ee Z

028.0)1(

)( 2

Z

Z

eeZf

32

The estimated marginal effects are f(Z) multiplied by the respective coefficients. We see that the effect of ASVABC is about the same as before. Mother's schooling has negligible effect and father's schooling has no discernible effect at all.


iii

ZfXZ

dZdp

Xp )(



ASVABC 51.36 0.133 6.826 0.028 0.004

SM 11.58 –0.023 –0.269 0.028 –0.001

SF 11.84 0.012 0.146 0.028 0.000

MALE 0.50 0.128 0.064 0.028 0.004

Constant 1.00 –3.252 –3.252

Total 3.514



ASVABC 51.36 0.133 6.826 0.028 0.004

SM 11.58 –0.023 –0.269 0.028 –0.001

SF 11.84 0.012 0.146 0.028 0.000

MALE 0.50 0.128 0.064 0.028 0.004

Constant 1.00 –3.252 –3.252

Total 3.514

33


iii

ZfXZ

dZdp

Xp )(

Males have 0.4 percent higher probability of graduating than females. These effects would all have been larger if they had been evaluated at a lower ASVABC score.

Individuals who graduated: outcome probability is

34

This sequence will conclude with an outline explanation of how the model is fitted using maximum likelihood estimation.


iASVABCe 2111

ASVABCZ 21

ASVABC

Z

e

eZFp

21111

1)(

35

In the case of an individual who graduated, the probability of that outcome is F(Z). We will give subscripts 1, ..., s to the individuals who graduated.


iASVABCe 2111

ASVABCZ 21

ASVABC

Z

e

eZFp

21111

1)(


36

In the case of an individual who did not graduate, the probability of that outcome is 1 – F(Z). We will give subscripts s+1, ..., n to these individuals.


iASVABCe 2111

iASVABCe 21111

Maximize F(Z1) x ... x F(Zs) x [1 – F(Zs+1)] x ... x [1 – F(Zn)]


Individuals did not graduate: outcome probability is

Maximize F(Z1) x ... x F(Zs) x [1 – F(Zs+1)] x ... x [1 – F(Zn)]

Did graduate Did not graduate

37

We choose b1 and b2 so as to maximize the joint probability of the outcomes, that is, F(Z1) x ... x F(Zs) x [1 – F(Zs+1)] x ... x [1 – F(Zn)]. There are no mathematical formulae for b1 and b2. They have to be determined iteratively by a trial-and-error process.


iASVABCe 2111 iASVABCe 211

11

ns

s

ASVABCbbASVABCbb

ASVABCbbASVABCbb

ee

ee

21121

21121

111...

111

11...

11

1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense...

Documents

Transcript of 1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense...