Discrete Choice Modeling

86
[Part 2] 1/86 Discrete Choice Modeling Binary Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 Count Data 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference 13 Hybrid Choice

description

William Greene Stern School of Business New York University. Discrete Choice Modeling. 0Introduction 1Methodology 2Binary Choice 3Panel Data 4Bivariate Probit 5Ordered Choice 6Count Data 7Multinomial Choice 8Nested Logit 9Heterogeneity 10Latent Class 11Mixed Logit - PowerPoint PPT Presentation

Transcript of Discrete Choice Modeling

Page 1: Discrete Choice Modeling

[Part 2] 1/86

Discrete Choice Modeling

Binary Choice Models

Discrete Choice Modeling

William GreeneStern School of BusinessNew York University

0 Introduction1 Summary2 Binary Choice3 Panel Data4 Bivariate Probit5 Ordered Choice6 Count Data7 Multinomial Choice8 Nested Logit9 Heterogeneity10 Latent Class11 Mixed Logit12 Stated Preference13 Hybrid Choice

Page 2: Discrete Choice Modeling

[Part 2] 2/86

Discrete Choice Modeling

Binary Choice Models

A Random Utility Approach Underlying Preference Scale, U*(choices) Revelation of Preferences:

U*(choices) < 0 Choice “0”

U*(choices) > 0 Choice “1”

Page 3: Discrete Choice Modeling

[Part 2] 3/86

Discrete Choice Modeling

Binary Choice Models

Binary Outcome: Visit Doctor

Page 4: Discrete Choice Modeling

[Part 2] 4/86

Discrete Choice Modeling

Binary Choice Models

A Model for Binary Choice Yes or No decision (Buy/NotBuy, Do/NotDo)

Example, choose to visit physician or not

Model: Net utility of visit at least once

Uvisit = +1Age + 2Income + Sex +

Choose to visit if net utility is positive

Net utility = Uvisit – Unot visit

Data: X = [1,age,income,sex] y = 1 if choose visit, Uvisit > 0, 0 if not.

Random Utility

Page 5: Discrete Choice Modeling

[Part 2] 5/86

Discrete Choice Modeling

Binary Choice Models

Modeling the Binary ChoiceNet Utility = Uvisit – Unot visit

Normalize Unot visit = 0

Net Uvisit = + 1 Age + 2 Income + 3 Sex +

Chooses to visit: Uvisit > 0

+ 1 Age + 2 Income + 3 Sex + > 0

> -[ + 1 Age + 2 Income + 3 Sex ]

Choosing Between the Two Alternatives

Page 6: Discrete Choice Modeling

[Part 2] 6/86

Discrete Choice Modeling

Binary Choice Models

Probability Model for Choice Between Two Alternatives

> -[ + 1Age + 2Income + 3Sex ]

Probability is governed by , the random part of the utility function.

Page 7: Discrete Choice Modeling

[Part 2] 7/86

Discrete Choice Modeling

Binary Choice Models

Application 27,326 Observations

1 to 7 years, panel 7,293 households observed We use the 1994 year, 3,337 household

observations

Page 8: Discrete Choice Modeling

[Part 2] 8/86

Discrete Choice Modeling

Binary Choice Models

Binary Choice Data

Page 9: Discrete Choice Modeling

[Part 2] 9/86

Discrete Choice Modeling

Binary Choice Models

An Econometric Model Choose to visit iff Uvisit > 0

Uvisit = + 1 Age + 2 Income + 3 Sex +

Uvisit > 0 > -( + 1 Age + 2 Income + 3 Sex) < + 1 Age + 2 Income + 3 Sex

Probability model: For any person observed by the analyst, Prob(visit) = Prob[ < + 1 Age + 2 Income + 3 Sex]

Note the relationship between the unobserved and the outcome

Page 10: Discrete Choice Modeling

[Part 2] 10/86

Discrete Choice Modeling

Binary Choice Models

+1Age + 2 Income + 3

Sex

Page 11: Discrete Choice Modeling

[Part 2] 11/86

Discrete Choice Modeling

Binary Choice Models

Modeling Approaches Nonparametric – “relationship”

Minimal Assumptions Minimal Conclusions

Semiparametric – “index function” Stronger assumptions Robust to model misspecification (heteroscedasticity) Still weak conclusions

Parametric – “Probability function and index” Strongest assumptions – complete specification Strongest conclusions Possibly less robust. (Not necessarily)

The linear probability “model”

Page 12: Discrete Choice Modeling

[Part 2] 12/86

Discrete Choice Modeling

Binary Choice Models

Nonparametric Regressions

P(Visit)=f(Income)

P(Visit)=f(Age)

Page 13: Discrete Choice Modeling

[Part 2] 13/86

Discrete Choice Modeling

Binary Choice Models

Klein and Spady SemiparametricNo specific distribution assumed

Note necessary normalizations. Coefficients are relative to FEMALE.

Prob(yi = 1 | xi ) =G(’x) G is estimated by kernel methods

Page 14: Discrete Choice Modeling

[Part 2] 14/86

Discrete Choice Modeling

Binary Choice Models

Fully Parametric Index Function: U* = β’x + ε Observation Mechanism: y = 1[U* > 0] Distribution: ε ~ f(ε); Normal, Logistic, … Maximum Likelihood Estimation:

Max(β) logL = Σi log Prob(Yi = yi|xi)

Page 15: Discrete Choice Modeling

[Part 2] 15/86

Discrete Choice Modeling

Binary Choice Models

Parametric: Logit Model

What do these mean?

Page 16: Discrete Choice Modeling

[Part 2] 16/86

Discrete Choice Modeling

Binary Choice Models

Parametric Model Estimation How to estimate , 1, 2, 3?

The technique of maximum likelihood

Prob[y=1] = Prob[ > -( + 1 Age + 2 Income + 3 Sex)] Prob[y=0] = 1 - Prob[y=1]

Requires a model for the probability

0 1Prob[ 0 | ] Prob[ 1| ]

y yL y y

x x

Page 17: Discrete Choice Modeling

[Part 2] 17/86

Discrete Choice Modeling

Binary Choice Models

Completing the Model: F() The distribution

Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric Others…

Does it matter? Yes, large difference in estimates Not much, quantities of interest are more stable.

Page 18: Discrete Choice Modeling

[Part 2] 18/86

Discrete Choice Modeling

Binary Choice Models

Estimated Binary Choice Models

Log-L(0) = log likelihood for a model that has only a constant term.Ignore the t ratios for now.

Page 19: Discrete Choice Modeling

[Part 2] 19/86

Discrete Choice Modeling

Binary Choice Models

+ 1 (Age+1) + 2 (Income) + 3 Sex

Effect on Predicted Probability of an Increase in Age

(1 is positive)

Page 20: Discrete Choice Modeling

[Part 2] 20/86

Discrete Choice Modeling

Binary Choice Models

Partial Effects in Probability Models Prob[Outcome] = some F(+1Income…) “Partial effect” = F(+1Income…) / ”x” (derivative)

Partial effects are derivatives Result varies with model

Logit: F(+1Income…) /x = Prob * (1-Prob) Probit: F(+1Income…)/x = Normal density Extreme Value: F(+1Income…)/x = Prob * (-log Prob)

Scaling usually erases model differences

Page 21: Discrete Choice Modeling

[Part 2] 21/86

Discrete Choice Modeling

Binary Choice Models

Estimated Partial Effects

Page 22: Discrete Choice Modeling

[Part 2] 22/86

Discrete Choice Modeling

Binary Choice Models

Partial Effect for a Dummy Variable Prob[yi = 1|xi,di] = F(’xi+di) = conditional mean Partial effect of d Prob[yi = 1|xi, di=1] - Prob[yi = 1|xi, di=0]

Probit: ˆ ˆˆ( ) x xid

Page 23: Discrete Choice Modeling

[Part 2] 23/86

Discrete Choice Modeling

Binary Choice Models

Partial Effect – Dummy Variable

Page 24: Discrete Choice Modeling

[Part 2] 24/86

Discrete Choice Modeling

Binary Choice Models

Computing Partial Effects Compute at the data means?

Simple Inference is well defined.

Average the individual effects More appropriate? Asymptotic standard errors are complicated.

Page 25: Discrete Choice Modeling

[Part 2] 25/86

Discrete Choice Modeling

Binary Choice Models

Average Partial Effects

i i

i ii i

i i

n ni ii 1 i 1

i

Probability = P F( ' )P F( ' )Partial Effect = f ( ' ) =

1 1Average Partial Effect = f ( ' )n n

are estimates of =E[ ] under certain assumptions.

xx x d

x x

d x

d

Page 26: Discrete Choice Modeling

[Part 2] 26/86

Discrete Choice Modeling

Binary Choice Models

Average Partial Effects vs. Partial Effects at Data Means

Page 27: Discrete Choice Modeling

[Part 2] 27/86

Discrete Choice Modeling

Binary Choice Models

Practicalities of Nonlinearities

PROBIT ; Lhs=doctor ; Rhs=one,age,agesq,income,female ; Partial effects $

PROBIT ; Lhs=doctor ; Rhs=one,age,age*age,income,female $PARTIALS ; Effects : age $

The software does not know that agesq = age2.

The software knows that age * age is age2.

Page 28: Discrete Choice Modeling

[Part 2] 28/86

Discrete Choice Modeling

Binary Choice Models

Partial Effect for Nonlinear Terms2

1 2 3 4

21 2 3 4 1 2

2

Prob [ Age Age Income Female]Prob [ Age Age Income Female] ( 2 Age)Age

(1.30811 .06487 .0091 .17362 .39666 )

[( .06487 2(.0091) ] Age Age Income FemaleAge

Must be computed at specific values of Age, Income and Female

Page 29: Discrete Choice Modeling

[Part 2] 29/86

Discrete Choice Modeling

Binary Choice Models

Average Partial Effect: Averaged over Sample Incomes and Genders for Specific Values of Age

Page 30: Discrete Choice Modeling

[Part 2] 30/86

Discrete Choice Modeling

Binary Choice Models

The Linear Probability “Model”

)

Prob(y = 1| ) =E[y | ] = 0 * Prob(y = 1| ) +1Prob(y = 1| ) = Prob(y = 1|y = + ε

x β xx x x x

β x

Page 31: Discrete Choice Modeling

[Part 2] 31/86

Discrete Choice Modeling

Binary Choice Models

The Dependent Variable equals zero for 98.9% of the observations. In the sample of 163,474 observations, the LHS variable equals 1 about 1,500 times.

Page 32: Discrete Choice Modeling

[Part 2] 32/86

Discrete Choice Modeling

Binary Choice Models

2SLS for a binary dependent variable.

Page 33: Discrete Choice Modeling

[Part 2] 33/86

Discrete Choice Modeling

Binary Choice Models

)

ˆ ˆ ˆ if y = 1, or 0 if y = 0The standard errors make no sense because the stochastic propertiesof

Prob(y = 1| ) =E[y | ] = 0 * Prob(y = 1| ) +1Prob(y = 1| ) = Prob(y = 1|y = + ε

Residuals : e = y -

x β xx x x x

β x

β x = 1- β x - β x

the "disturbance" are inconsistent with the observed variable.

Page 34: Discrete Choice Modeling

[Part 2] 34/86

Discrete Choice Modeling

Binary Choice Models

)

ˆ ˆ ˆ if y = 1, or 0 if y = 0The standard errors make no sense because the stochastic propertiesof

Prob(y = 1| ) =E[y | ] = 0 * Prob(y = 1| ) +1Prob(y = 1| ) = Prob(y = 1|y = + ε

Residuals : e = y -

x β xx x x x

β x

β x = 1- β x - β x

the "disturbance" are inconsistent with the observed variable.

The variance of y|x equals ) ) (1 )The "disturbances" are heteroscedastic. Users of the LPM always seem to worr

Prob(y = 0 | Prob(y = 1|x x β x β x

y about clustering. They never seem to worry about heteroscedasticity.

Page 35: Discrete Choice Modeling

[Part 2] 35/86

Discrete Choice Modeling

Binary Choice Models

1.7% of the observations are > 20DV = 1(DocVis > 20)

Page 36: Discrete Choice Modeling

[Part 2] 36/86

Discrete Choice Modeling

Binary Choice Models

What does OLS Estimate?

MLE

Average Partial Effects

OLS Coefficients

Page 37: Discrete Choice Modeling

[Part 2] 37/86

Discrete Choice Modeling

Binary Choice Models

Page 38: Discrete Choice Modeling

[Part 2] 38/86

Discrete Choice Modeling

Binary Choice Models

Negative Predicted Probabilities

Page 39: Discrete Choice Modeling

[Part 2] 39/86

Discrete Choice Modeling

Binary Choice Models

Measuring Fit

Page 40: Discrete Choice Modeling

[Part 2] 40/86

Discrete Choice Modeling

Binary Choice Models

How Well Does the Model Fit? There is no R squared.

Least squares for linear models is computed to maximize R2

There are no residuals or sums of squares in a binary choice model The model is not computed to optimize the fit of the model to the

data How can we measure the “fit” of the model to the

data? “Fit measures” computed from the log likelihood

“Pseudo R squared” = 1 – logL/logL0 Also called the “likelihood ratio index” Others… - these do not measure fit.

Direct assessment of the effectiveness of the model at predicting the outcome

Page 41: Discrete Choice Modeling

[Part 2] 41/86

Discrete Choice Modeling

Binary Choice Models

Log Likelihoods logL = ∑i log density (yi|xi,β) For probabilities

Density is a probability Log density is < 0 LogL is < 0

For other models, log density can be positive or negative. For linear regression,

logL=-N/2(1+log2π+log(e’e/N)] Positive if s2 < .058497

Page 42: Discrete Choice Modeling

[Part 2] 42/86

Discrete Choice Modeling

Binary Choice Models

Likelihood Ratio Index 1

log (1 ) log[1 ( )] log ( )

1. Suppose the model predicted ( ) 1 whenever y=1 and ( ) 0 whenever y=0. Then, logL = 0. [ ( ) cannot equal 0 or 1 at any finite .]2. S

x x

xx

x

Ni i i ii

i

i

i

L y F y F

FF

F

0

0 0 01

0 0 1 0

0

uppose the model always predicted the same value, F( )

LogL = (1 ) log[1 F( )] log F( )

= log[1 F( )] log F( ) < 0

log LRI = 1 - . Since logL >log

N

i iiy y

N N

LL 0 logL 0 LRI < 1.

Page 43: Discrete Choice Modeling

[Part 2] 43/86

Discrete Choice Modeling

Binary Choice Models

The Likelihood Ratio Index Bounded by 0 and 1-ε Rises when the model is expanded Values between 0 and 1 have no meaning Can be strikingly low. Should not be used to compare models

Use logL Use information criteria to compare nonnested models

Page 44: Discrete Choice Modeling

[Part 2] 44/86

Discrete Choice Modeling

Binary Choice Models

Fit Measures Based on LogL----------------------------------------------------------------------Binary Logit Model for Binary ChoiceDependent variable DOCTORLog likelihood function -2085.92452 Full model LogLRestricted log likelihood -2169.26982 Constant term only LogL0Chi squared [ 5 d.f.] 166.69058Significance level .00000McFadden Pseudo R-squared .0384209 1 – LogL/logL0Estimation based on N = 3377, K = 6Information Criteria: Normalization=1/N Normalized UnnormalizedAIC 1.23892 4183.84905 -2LogL + 2KFin.Smpl.AIC 1.23893 4183.87398 -2LogL + 2K + 2K(K+1)/(N-K-1)Bayes IC 1.24981 4220.59751 -2LogL + KlnNHannan Quinn 1.24282 4196.98802 -2LogL + 2Kln(lnN)--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- |Characteristics in numerator of Prob[Y = 1]Constant| 1.86428*** .67793 2.750 .0060 AGE| -.10209*** .03056 -3.341 .0008 42.6266 AGESQ| .00154*** .00034 4.556 .0000 1951.22 INCOME| .51206 .74600 .686 .4925 .44476 AGE_INC| -.01843 .01691 -1.090 .2756 19.0288 FEMALE| .65366*** .07588 8.615 .0000 .46343--------+-------------------------------------------------------------

Page 45: Discrete Choice Modeling

[Part 2] 45/86

Discrete Choice Modeling

Binary Choice Models

Fit Measures Based on Predictions Computation

Use the model to compute predicted probabilities

Use the model and a rule to compute predicted y = 0 or 1

Fit measure compares predictions to actuals

Page 46: Discrete Choice Modeling

[Part 2] 46/86

Discrete Choice Modeling

Binary Choice Models

Predicting the Outcome Predicted probabilities P = F(a + b1Age + b2Income + b3Female+…) Predicting outcomes

Predict y=1 if P is “large” Use 0.5 for “large” (more likely than not) Generally, use

Count successes and failures

ˆy 1 if P > P*

Page 47: Discrete Choice Modeling

[Part 2] 47/86

Discrete Choice Modeling

Binary Choice Models

Cramer Fit Measure

1 1

1 0

F = Predicted Probabilityˆ ˆF (1 )Fˆ

N Nˆ ˆ ˆMean F | when = 1 - Mean F | when = 0

=

N Ni i i iy y

y y

reward for correct predictions minus penalty for incorrect predictions

+----------------------------------------+| Fit Measures Based on Model Predictions|| Efron = .04825|| Ben Akiva and Lerman = .57139|| Veall and Zimmerman = .08365|| Cramer = .04771|+----------------------------------------+

Page 48: Discrete Choice Modeling

[Part 2] 48/86

Discrete Choice Modeling

Binary Choice Models

Hypothesis Testing in Binary Choice Models

Page 49: Discrete Choice Modeling

[Part 2] 49/86

Discrete Choice Modeling

Binary Choice Models

Covariance Matrix for the MLE

1

Log Likelihood

log {(1 )log[1 ( )] log ( )}

We focus on the standard choices of ( ), probit and logit.Both distributions are symmetric; F(t)=1-F(-t). Therefore, the terms in the su

Ni i i ii

i

L y F y F

F

x x

x

22

2

ms arelog log [ ( )]where q 2 1

log [ ( )]( ) = q

[ ( )]

log( )( ) =

These simplify considerably. Note q 1. For the l

i i i i i

i i i ii i i i i

i i

ii i i i i

i

L F q yL F q F

qF q F

L F F q qF F

xx x x gx

x x H

ogit model, F= , F = (1- ) and F = (1- )(1-2 ).For the probit model, F= , F = and F = -[ ( )]i iq

x

Page 50: Discrete Choice Modeling

[Part 2] 50/86

Discrete Choice Modeling

Binary Choice Models

Simplificationsi i

2 2

i i

i i

Logit: g = y - H = (1- ) E[H ] = = (1- )

( )Probit: g = H = , E[H ] = =

(1 )

Estimators: Based on H , E[H ]

i i i i i i i i

i i i i i i ii i

i i i i i

q q

x

2

1

1

1

1

and g all functions evaluated at ( )

ˆActual Hessian: Est.Asy.Var[ ] =

ˆExpected Hessian: Est.Asy.Var[ ] =

ˆBHHH: Est.Asy.Var[ ] =

i i i

Ni i ii

Ni i ii

q

H

x

x x

x x

12

1gNi i ii

x x

Page 51: Discrete Choice Modeling

[Part 2] 51/86

Discrete Choice Modeling

Binary Choice Models

Robust Covariance Matrix

11 22

1

"Robust" Covariance Matrix: = = negative inverse of second derivatives matrix

log Problog = estimated E - ˆ ˆ

= matrix sum of outer products of

N ii

L

V A B AA

B

1

1

1

1

21

first derivatives

log Prob log Problog log = estimated E ˆ ˆ

ˆ ˆFor a logit model, = (1 )

ˆ = ( )

N i ii

Ni i i ii

Ni i i ii

L L

P P

y P

A x x

B x x

21

(Resembles the White estimator in the linear model case.)

Ni i iie

x x

Page 52: Discrete Choice Modeling

[Part 2] 52/86

Discrete Choice Modeling

Binary Choice Models

Robust Covariance Matrix for Logit Model--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- |Robust Standard ErrorsConstant| 1.86428*** .68442 2.724 .0065 AGE| -.10209*** .03115 -3.278 .0010 42.6266 AGESQ| .00154*** .00035 4.446 .0000 1951.22 INCOME| .51206 .75103 .682 .4954 .44476 AGE_INC| -.01843 .01703 -1.082 .2792 19.0288 FEMALE| .65366*** .07585 8.618 .0000 .46343--------+------------------------------------------------------------- |Conventional Standard Errors Based on Second DerivativesConstant| 1.86428*** .67793 2.750 .0060 AGE| -.10209*** .03056 -3.341 .0008 42.6266 AGESQ| .00154*** .00034 4.556 .0000 1951.22 INCOME| .51206 .74600 .686 .4925 .44476 AGE_INC| -.01843 .01691 -1.090 .2756 19.0288 FEMALE| .65366*** .07588 8.615 .0000 .46343

Page 53: Discrete Choice Modeling

[Part 2] 53/86

Discrete Choice Modeling

Binary Choice Models

Base Model for Hypothesis Tests----------------------------------------------------------------------Binary Logit Model for Binary ChoiceDependent variable DOCTORLog likelihood function -2085.92452Restricted log likelihood -2169.26982Chi squared [ 5 d.f.] 166.69058Significance level .00000McFadden Pseudo R-squared .0384209Estimation based on N = 3377, K = 6Information Criteria: Normalization=1/N Normalized UnnormalizedAIC 1.23892 4183.84905--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- |Characteristics in numerator of Prob[Y = 1]Constant| 1.86428*** .67793 2.750 .0060 AGE| -.10209*** .03056 -3.341 .0008 42.6266 AGESQ| .00154*** .00034 4.556 .0000 1951.22 INCOME| .51206 .74600 .686 .4925 .44476 AGE_INC| -.01843 .01691 -1.090 .2756 19.0288 FEMALE| .65366*** .07588 8.615 .0000 .46343--------+-------------------------------------------------------------

H0: Age is not a significant determinant of Prob(Doctor = 1)H0: β2 = β3 = β5 = 0

Page 54: Discrete Choice Modeling

[Part 2] 54/86

Discrete Choice Modeling

Binary Choice Models

Likelihood Ratio Tests

Null hypothesis restricts the parameter vector Alternative relaxes the restriction Test statistic: Chi-squared =

2 (LogL|Unrestricted model – LogL|Restrictions) > 0 Degrees of freedom = number of restrictions

Page 55: Discrete Choice Modeling

[Part 2] 55/86

Discrete Choice Modeling

Binary Choice Models

LR Test of H0

RESTRICTED MODELBinary Logit Model for Binary ChoiceDependent variable DOCTORLog likelihood function -2124.06568Restricted log likelihood -2169.26982Chi squared [ 2 d.f.] 90.40827Significance level .00000McFadden Pseudo R-squared .0208384Estimation based on N = 3377, K = 3Information Criteria: Normalization=1/N Normalized UnnormalizedAIC 1.25974 4254.13136

UNRESTRICTED MODELBinary Logit Model for Binary ChoiceDependent variable DOCTORLog likelihood function -2085.92452Restricted log likelihood -2169.26982Chi squared [ 5 d.f.] 166.69058Significance level .00000McFadden Pseudo R-squared .0384209Estimation based on N = 3377, K = 6Information Criteria: Normalization=1/N Normalized UnnormalizedAIC 1.23892 4183.84905

Chi squared[3] = 2[-2085.92452 - (-2124.06568)] = 77.46456

Page 56: Discrete Choice Modeling

[Part 2] 56/86

Discrete Choice Modeling

Binary Choice Models

Wald Test Unrestricted parameter vector is estimated Discrepancy: q= Rb – m (or r(b,m) if nonlinear)

is computed Variance of discrepancy is estimated:

Var[q] = R V R’ Wald Statistic is q’[Var(q)]-1q = q’[RVR’]-1q

Page 57: Discrete Choice Modeling

[Part 2] 57/86

Discrete Choice Modeling

Binary Choice Models

Chi squared[3] = 69.0541

Wald Test

Page 58: Discrete Choice Modeling

[Part 2] 58/86

Discrete Choice Modeling

Binary Choice Models

Lagrange Multiplier Test Restricted model is estimated Derivatives of unrestricted model and

variances of derivatives are computed at restricted estimates

Wald test of whether derivatives are zero tests the restrictions

Usually hard to compute – difficult to program the derivatives and their variances.

Page 59: Discrete Choice Modeling

[Part 2] 59/86

Discrete Choice Modeling

Binary Choice Models

LM Test for a Logit Model Compute b0 (subject to restictions)

(e.g., with zeros in appropriate positions.

Compute Pi(b0) for each observation.

Compute ei(b0) = [yi – Pi(b0)]

Compute gi(b0) = xiei using full xi vector

LM = [Σigi(b0)]’[Σigi(b0)gi(b0)]-1[Σigi(b0)]

Page 60: Discrete Choice Modeling

[Part 2] 60/86

Discrete Choice Modeling

Binary Choice Models

Page 61: Discrete Choice Modeling

[Part 2] 61/86

Discrete Choice Modeling

Binary Choice Models

Page 62: Discrete Choice Modeling

[Part 2] 62/86

Discrete Choice Modeling

Binary Choice Models

Page 63: Discrete Choice Modeling

[Part 2] 63/86

Discrete Choice Modeling

Binary Choice Models

Page 64: Discrete Choice Modeling

[Part 2] 64/86

Discrete Choice Modeling

Binary Choice Models

Inference About Partial Effects

Page 65: Discrete Choice Modeling

[Part 2] 65/86

Discrete Choice Modeling

Binary Choice Models

Marginal Effects for Binary Choice

ˆ [ | ]

[ | ]ˆ ˆ ˆ

y

y

PROBIT x x

x xx

ˆ ˆ ˆ: [ | ] exp / 1 exp

[ | ]ˆ ˆ ˆ ˆ1

y

y

LOGIT x x x x

x x xx

1

1 1

ˆ [ | ] P exp exp

[ | ]ˆ ˆP logP

y

y

EXTREME VALUE x x

xx

Page 66: Discrete Choice Modeling

[Part 2] 66/86

Discrete Choice Modeling

Binary Choice Models

The Delta Method

1 1

ˆ ˆ ˆ

ˆ ,ˆ ˆ ˆ ˆˆ, , Est.Asy.Varˆ

ˆ ˆ,

ˆ ˆ ˆ ˆ1 1 2

P log P 1 l, og

ff

Probit G x I x x

Logit G x x I x x

ExtVlu

xx , G x , V =

x IxG

1ˆ ,

ˆ ˆ ˆˆEst.Asy.Var

P

, ,

ˆ

x

G x V G x

x

Page 67: Discrete Choice Modeling

[Part 2] 67/86

Discrete Choice Modeling

Binary Choice Models

Computing Effects Compute at the data means?

Simple Inference is well defined

Average the individual effects More appropriate? Asymptotic standard errors more complicated.

Is testing about marginal effects meaningful? f(b’x) must be > 0; b is highly significant How could f(b’x)*b equal zero?

Page 68: Discrete Choice Modeling

[Part 2] 68/86

Discrete Choice Modeling

Binary Choice Models

APE vs. Partial Effects at the Mean

1

Delta Method for Average Partial Effect1 ˆEstimator of Var PartialEffectN

iiN

G Var G

Page 69: Discrete Choice Modeling

[Part 2] 69/86

Discrete Choice Modeling

Binary Choice Models

Method of Krinsky and RobbEstimate β by Maximum Likelihood with bEstimate asymptotic covariance matrix with VDraw R observations b(r) from the normal population N[b,V]b(r) = b + C*v(r), v(r) drawn from N[0,I] C = Cholesky matrix, V = CC’Compute partial effects d(r) using b(r)Compute the sample variance of d(r),r=1,…,RUse the sample standard deviations of the R observations to estimate the sampling standard errors for the partial effects.

Page 70: Discrete Choice Modeling

[Part 2] 70/86

Discrete Choice Modeling

Binary Choice Models

Krinsky and RobbDelta Method

Page 71: Discrete Choice Modeling

[Part 2] 71/86

Discrete Choice Modeling

Binary Choice Models

Partial Effect for Nonlinear Terms2

1 2 3 4

21 2 3 4 1 2

Prob [ Age Age Income Female]Prob [ Age Age Income Female] ( 2 Age)Age

(1) Must be computed for a specific value of Age(2) Compute standard errors using delta method or Krins

1 2

ky and Robb.(3) Compute confidence intervals for different values of Age.(4) Test of hypothesis that this equals zero is identical to a test that (β + 2β Age) = 0. Is this an interesting hypothesis?

2(1.30811 .06487 .0091 .17362 .39666) )Prob[( .06487 2(.0091) ]

Age Age Income FemaleAGE Age

Page 72: Discrete Choice Modeling

[Part 2] 72/86

Discrete Choice Modeling

Binary Choice Models

Average Partial Effect: Averaged over Sample Incomes and Genders for Specific Values of Age

Page 73: Discrete Choice Modeling

[Part 2] 73/86

Discrete Choice Modeling

Binary Choice Models

Endogeneity

Page 74: Discrete Choice Modeling

[Part 2] 74/86

Discrete Choice Modeling

Binary Choice Models

Endogenous RHS Variable U* = β’x + θh + ε

y = 1[U* > 0] E[ε|h] ≠ 0 (h is endogenous)

Case 1: h is continuous Case 2: h is binary = a treatment effect

Approaches Parametric: Maximum Likelihood Semiparametric (not developed here):

GMM Various approaches for case 2

Page 75: Discrete Choice Modeling

[Part 2] 75/86

Discrete Choice Modeling

Binary Choice Models

Endogenous Continuous Variable U* = β’x + θh + ε

y = 1[U* > 0] h = α’z + u

E[ε|h] ≠ 0 Cov[u, ε] ≠ 0 Additional Assumptions: (u,ε) ~ N[(0,0),(σu

2, ρσu, 1)] z = a valid set of exogenous

variables, uncorrelated with (u,ε)

Correlation = ρ.This is the source of the endogeneity

This is not IV estimation. Z may be uncorrelated with X without problems.

Page 76: Discrete Choice Modeling

[Part 2] 76/86

Discrete Choice Modeling

Binary Choice Models

Endogenous Income

0 = Not Healthy 1 = Healthy

Healthy = 0 or 1Age, Married, Kids, Gender, Income

Determinants of Income (observed and unobserved) also determine health

satisfaction.

Income responds toAge, Age2, Educ, Married, Kids, Gender

Page 77: Discrete Choice Modeling

[Part 2] 77/86

Discrete Choice Modeling

Binary Choice Models

Estimation by ML (Control Function)Probit fit of y to and will not consistently estimate ( , )because of the correlation between h and induced by thecorrelation of u and . Using the bivariate normality,

(Prob( 1| , )

h

hy h

x

xx

2

2

/ )

1

Insert = ( - )/ and include f(h| ) to form logL

-

log (2 1)1

logL=

- 1log

u

i i u

i ii i

ui

i i

u u

u

u h

hhy

h

α z z

α zx

α z

N

i=1

Page 78: Discrete Choice Modeling

[Part 2] 78/86

Discrete Choice Modeling

Binary Choice Models

Two Approaches to ML

u

(1) Maximize the full log likelihood with respect to ( , , , , ) (The built in Stata routine IVPROBIT does this. It is not an instrumental variable estimat

or; it i

Full information ML.

s a FIML estimator.) Note also, this does not imply replacing h with a prediction

ˆ from the regression then using probit with h instead of h.(2) Two step limited information ML. (Control Fun

u

2

(a) Use OLS to estimate and with and s.ˆ ˆ (b) Compute = / = ( ) /

ˆˆ ˆ ˆ (c) log log1

The second step is to fit a probit m

i i i i

i i ii i i

v u s h s

h vh v

aa z

o

x

ct

x

i n)

ˆodel for y to ( , , ) thensolve back for ( , , ) from ( , , ) and from the previouslyestimated and s. Use the delta method to compute standard errors.

h v

x

a

Page 79: Discrete Choice Modeling

[Part 2] 79/86

Discrete Choice Modeling

Binary Choice Models

FIML Estimates----------------------------------------------------------------------Probit with Endogenous RHS VariableDependent variable HEALTHYLog likelihood function -6464.60772--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- |Coefficients in Probit Equation for HEALTHYConstant| 1.21760*** .06359 19.149 .0000 AGE| -.02426*** .00081 -29.864 .0000 43.5257 MARRIED| -.02599 .02329 -1.116 .2644 .75862 HHKIDS| .06932*** .01890 3.668 .0002 .40273 FEMALE| -.14180*** .01583 -8.959 .0000 .47877 INCOME| .53778*** .14473 3.716 .0002 .35208 |Coefficients in Linear Regression for INCOMEConstant| -.36099*** .01704 -21.180 .0000 AGE| .02159*** .00083 26.062 .0000 43.5257 AGESQ| -.00025*** .944134D-05 -26.569 .0000 2022.86 EDUC| .02064*** .00039 52.729 .0000 11.3206 MARRIED| .07783*** .00259 30.080 .0000 .75862 HHKIDS| -.03564*** .00232 -15.332 .0000 .40273 FEMALE| .00413** .00203 2.033 .0420 .47877 |Standard Deviation of Regression DisturbancesSigma(w)| .16445*** .00026 644.874 .0000 |Correlation Between Probit and Regression DisturbancesRho(e,w)| -.02630 .02499 -1.052 .2926--------+-------------------------------------------------------------

Page 80: Discrete Choice Modeling

[Part 2] 80/86

Discrete Choice Modeling

Binary Choice Models

Partial Effects: Scaled Coefficients

E[ | ] ( ) where ~ N[0,1] E[y|x, , ] = [ ( )]

= E[y| , , ] [ ( )]( )

Conditional Meanx xz zz x z

Partial Effects. Assume z x (just for convenience)x z x zx

u

u

u

y h hh u v v

v v

v v

,

R

1

E[y| , ] E[y| , , ] E ( ) [ ( )] ( )

E[y| , ] 1 . ( ) [ ( )]

x z x z x zx x

The integral does not have a closed form, but it can easily be simulated :x z x z

xFor v

v u

u rr

v v v dv

Est vR

k k , . , . ariables only in x omit For variables only in z omit

Page 81: Discrete Choice Modeling

[Part 2] 81/86

Discrete Choice Modeling

Binary Choice Models

Partial Effects

The scale factor is computed using the model coefficients, means of the variables and 35,000 draws from the standard normal population.

θ = 0.53778

Page 82: Discrete Choice Modeling

[Part 2] 82/86

Discrete Choice Modeling

Binary Choice Models

Endogenous Binary Variable U* = β’x + θh + ε

y = 1[U* > 0]h* = α’z + uh = 1[h* > 0]

E[ε|h*] ≠ 0 Cov[u, ε] ≠ 0 Additional Assumptions: (u,ε) ~ N[(0,0),(σu

2, ρσu, 1)] z = a valid set of exogenous

variables, uncorrelated with (u,ε)

Correlation = ρ.This is the source of the endogeneity

This is not IV estimation. Z may be uncorrelated with X without problems.

Page 83: Discrete Choice Modeling

[Part 2] 83/86

Discrete Choice Modeling

Binary Choice Models

Endogenous Binary VariableP(Y = y,H = h) = P(Y = y|H =h) x P(H=h)This is a simple bivariate probit model.Not a simultaneous equations model - the estimatoris FIML, not any kind of least squares.

Doctor = F(age,age2,income,female,Public) Public = F(age,educ,income,married,kids,female)

Page 84: Discrete Choice Modeling

[Part 2] 84/86

Discrete Choice Modeling

Binary Choice Models

FIML Estimates----------------------------------------------------------------------FIML Estimates of Bivariate Probit ModelDependent variable DOCPUBLog likelihood function -25671.43905Estimation based on N = 27326, K = 14--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- |Index equation for DOCTORConstant| .59049*** .14473 4.080 .0000 AGE| -.05740*** .00601 -9.559 .0000 43.5257 AGESQ| .00082*** .681660D-04 12.100 .0000 2022.86 INCOME| .08883* .05094 1.744 .0812 .35208 FEMALE| .34583*** .01629 21.225 .0000 .47877 PUBLIC| .43533*** .07357 5.917 .0000 .88571 |Index equation for PUBLICConstant| 3.55054*** .07446 47.681 .0000 AGE| .00067 .00115 .581 .5612 43.5257 EDUC| -.16839*** .00416 -40.499 .0000 11.3206 INCOME| -.98656*** .05171 -19.077 .0000 .35208 MARRIED| -.00985 .02922 -.337 .7361 .75862 HHKIDS| -.08095*** .02510 -3.225 .0013 .40273 FEMALE| .12139*** .02231 5.442 .0000 .47877 |Disturbance correlationRHO(1,2)| -.17280*** .04074 -4.241 .0000--------+-------------------------------------------------------------

Page 85: Discrete Choice Modeling

[Part 2] 85/86

Discrete Choice Modeling

Binary Choice Models

Partial Effects E[ | , ] ( ) E[ | , ] [ | , ] Prob( 0 | )E[ | , 0] Prob( 1| )E[ | , 1] ( ) ( ) ( ) ( )

h

y h hy E E y h

h y h h y h

Conditional Meanx xx z x

z x z xz x z x

Partial Effects Direct Ef

E[ | , ] ( ) ( ) ( ) ( )

E[ | , ] ( ) ( ) ( ) ( )

( ) ( ) ( )

y

y

fectsx z z x z x

x

Indirect Effectsx z z x z x

zz x x

Page 86: Discrete Choice Modeling

[Part 2] 86/86

Discrete Choice Modeling

Binary Choice Models

Identification Issues Exclusions are not needed for estimation Identification is, in principle, by “functional form” Researchers usually have a variable in the

treatment equation that is not in the main probit equation “to improve identification”

A fully simultaneous model y1 = f(x1,y2), y2 = f(x2,y1) Not identified even with exclusion restrictions (Model is “incoherent”)