Discrete Choice Modeling

79
[Part 9] 1/79 Discrete Choice Modeling Modeling Heterogeneity Discrete Choice Modeling William Greene Stern School of Business New York University 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 Count Data 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference 13 Hybrid Choice

description

William Greene Stern School of Business New York University. Discrete Choice Modeling. 0Introduction 1Methodology 2Binary Choice 3Panel Data 4Bivariate Probit 5Ordered Choice 6Count Data 7Multinomial Choice 8Nested Logit 9Heterogeneity 10Latent Class 11Mixed Logit - PowerPoint PPT Presentation

Transcript of Discrete Choice Modeling

Page 1: Discrete Choice Modeling

[Part 9] 1/79

Discrete Choice Modeling

Modeling Heterogeneity

Discrete Choice Modeling

William GreeneStern School of BusinessNew York University

0 Introduction1 Summary2 Binary Choice3 Panel Data4 Bivariate Probit5 Ordered Choice6 Count Data7 Multinomial Choice8 Nested Logit9 Heterogeneity10 Latent Class11 Mixed Logit12 Stated Preference13 Hybrid Choice

Page 2: Discrete Choice Modeling

[Part 9] 2/79

Discrete Choice Modeling

Modeling Heterogeneity

What’s Wrong with the MNL Model?

Insufficiently heterogeneous: “… economists are often more interested in aggregate effects

and regard heterogeneity as a statistical nuisance parameter problem which must be addressed but not emphasized. Econometricians frequently employ methods which do not allow for the estimation of individual level parameters.” (Allenby and Rossi, Journal of Econometrics, 1999)

Page 3: Discrete Choice Modeling

[Part 9] 3/79

Discrete Choice Modeling

Modeling Heterogeneity

Several Types of Heterogeneity Differences across choice makers

Observable: Usually demographics such as age, sex Unobservable: Usually modeled as ‘random effects’

Choice strategy: How consumers makedecisions. (E.g., omitted attributes)

Preference Structure: Model frameworks such as latent class structures

Preferences: Model ‘parameters’ Discrete variation – latent class Continuous variation – mixed models Discrete-Continuous variation

Page 4: Discrete Choice Modeling

[Part 9] 4/79

Discrete Choice Modeling

Modeling Heterogeneity

Heterogeneity in Choice StrategyConsumers avoid ‘complexity’

Lexicographic preferences eliminate certain choices choice set may be endogenously determined

Simplification strategies may eliminate certain attributes

Information processing strategy is a source of heterogeneity in the model.

Page 5: Discrete Choice Modeling

[Part 9] 5/79

Discrete Choice Modeling

Modeling Heterogeneity

Accommodating Heterogeneity

Observed? Enter in the model in familiar (and unfamiliar) ways.

Unobserved? Takes the form of randomness in the model.

Page 6: Discrete Choice Modeling

[Part 9] 6/79

Discrete Choice Modeling

Modeling Heterogeneity

Heterogeneity and the MNL Model

Limitations of the MNL Model: IID IIA Fundamental tastes are the same across all

individuals How to adjust the model to allow

variation across individuals? Full random variation Latent grouping – allow some variation

j itj

J(i)j itjj=1

exp(α + ' )P[choice j | i, t] =

exp(α + ' )

β xβ x

Page 7: Discrete Choice Modeling

[Part 9] 7/79

Discrete Choice Modeling

Modeling Heterogeneity

Observable Heterogeneity in Utility Levels

t

ijt j itj j it ijt

j itj itJ (i)

j itj itj=1

U = α + ' + +ε

exp(α + + )Prob[choice j | i, t] =

exp(α + + )j

β x γ zβ'x γ z

β'x γ z

Choice, e.g., among brands of carsxitj = attributes: price, features

zit = observable characteristics: age, sex, income

Page 8: Discrete Choice Modeling

[Part 9] 8/79

Discrete Choice Modeling

Modeling Heterogeneity

Observable Heterogeneity in Preference Weights

ijt j i itj it ijt

i i

i,k k k i

Hierarchical model - Interaction terms U = α + + +ε

= +Parameter heterogeneity is observable.Each parameter β = β +

Prob[choice j | i

jβ x γ zβ β Φh

φ h

t

j itj itJ (i)

j itj itj=1

exp(α + + ),t] =

exp(α + + )i j

i

β x γ zβ x γ z

Page 9: Discrete Choice Modeling

[Part 9] 9/79

Discrete Choice Modeling

Modeling Heterogeneity

Heteroscedasticity in the MNL Model• Motivation: Scaling in utility functions• If ignored, distorts coefficients• Random utility basis

Uij = j + ’xij + ’zi + jij

i = 1,…,N; j = 1,…,J(i)

F(ij) = Exp(-Exp(-ij)) now scaled• Extensions: Relaxes IIA

Allows heteroscedasticity across choices and across individuals

Page 10: Discrete Choice Modeling

[Part 9] 10/79

Discrete Choice Modeling

Modeling Heterogeneity

‘Quantifiable’ Heterogeneity in Scaling

ijt j itj it ijt

2 2 2ijt j i 1

U = α + + +ε

Var[ε ] = σ exp( ), σ = π / 6j

j

β'x γ z

δ w

wi = observable characteristics: age, sex, income, etc.

Page 11: Discrete Choice Modeling

[Part 9] 11/79

Discrete Choice Modeling

Modeling Heterogeneity

Modeling Unobserved Heterogeneity

Latent class – Discrete approximation Mixed logit – Continuous Many extensions and blends of LC and RP

Page 12: Discrete Choice Modeling

[Part 9] 12/79

Discrete Choice Modeling

Modeling Heterogeneity

LATENT CLASS MODELS

Page 13: Discrete Choice Modeling

[Part 9] 13/79

Discrete Choice Modeling

Modeling Heterogeneity

The “Finite Mixture Model” An unknown parametric model governs an outcome y

F(y|x,) This is the model

We approximate F(y|x,) with a weighted sum of specified (e.g., normal) densities: F(y|x,) j j G(y|x,) This is a search for functional form. With a sufficient

number of (normal) components, we can approximate any density to any desired degree of accuracy. (McLachlan and Peel (2000))

There is no “mixing” process at work

Page 14: Discrete Choice Modeling

[Part 9] 14/79

Discrete Choice Modeling

Modeling Heterogeneity

Density? Note significant mass below zero. Not a gamma or lognormal or any other familiar density.

Page 15: Discrete Choice Modeling

[Part 9] 15/79

Discrete Choice Modeling

Modeling Heterogeneity

ˆ

1 y - 7.05737 1 y - 3.25966F(y) =.28547 +.714533.79628 3.79628 1.81941 1.81941

ML Mixture of Two Normal Densities1000

2 i j

ji=1 j=1j j

y -μ1 LogL = log πσ σ

Maximum Likelihood Estimates Class 1 Class 2 Estimate Std. Error Estimate Std. errorμ 7.05737 .77151 3.25966 .09824σ 3.79628 .25395 1.81941 .10858π .28547 .05953 .71453 .05953

Page 16: Discrete Choice Modeling

[Part 9] 16/79

Discrete Choice Modeling

Modeling Heterogeneity

Mixing probabilities .715 and .285

Page 17: Discrete Choice Modeling

[Part 9] 17/79

Discrete Choice Modeling

Modeling Heterogeneity

The actual process is a mix of chi squared(5) and normal(3,2) with mixing probabilities .7 and .3.

2.5 1.5.5 exp( .5 ) 1 3( ) .7 .3(2.5) 2 2

y y yf y

Page 18: Discrete Choice Modeling

[Part 9] 18/79

Discrete Choice Modeling

Modeling Heterogeneity

Approximation

Actual Distribution

Page 19: Discrete Choice Modeling

[Part 9] 19/79

Discrete Choice Modeling

Modeling Heterogeneity

Latent Classes

Population contains a mixture of individuals of different types

Common form of the generating mechanism within the classes

Observed outcome y is governed by the common process F(y|x,j )

Classes are distinguished by the parameters, j.

Page 20: Discrete Choice Modeling

[Part 9] 20/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 21: Discrete Choice Modeling

[Part 9] 21/79

Discrete Choice Modeling

Modeling Heterogeneity

The Latent Class “Model” Parametric Model:

F(y|x,) E.g., y ~ N[x, 2], y ~ Poisson[=exp(x)], etc.

Density F(y|x,) j j F(y|x,j ), = [1, 2,…, J, 1, 2,…, J] j j = 1

Generating mechanism for an individual drawn at random from the mixed population is F(y|x,).

Class probabilities relate to a stable process governing the mixture of types in the population

Page 22: Discrete Choice Modeling

[Part 9] 22/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 23: Discrete Choice Modeling

[Part 9] 23/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 24: Discrete Choice Modeling

[Part 9] 24/79

Discrete Choice Modeling

Modeling Heterogeneity

RANDOM PARAMETER MODELS

Page 25: Discrete Choice Modeling

[Part 9] 25/79

Discrete Choice Modeling

Modeling Heterogeneity

A Recast Random Effects Model

i

1 2 ,1

+ + , ~ [0, ]

T = observations on individual iFor each period, 1[ 0] (given u )Joint probability for T observations is

Prob( , ,... | ) ( )

Write

x

x

i

it i it i u

i i

i

it it

i

Ti i i it i itt

it u u Nu

y U

y y u F

U

y

u u

1 , u1

1

= , ~ [0,1],

log | ,... log ( ( ) )

It is not possible to maximize log | ,... because ofthe unobserved random effects embedded in .

xi

i i i i i

N TN it i iti i t

N

i

u v v N v

L v v F y v

L v v

Page 26: Discrete Choice Modeling

[Part 9] 26/79

Discrete Choice Modeling

Modeling Heterogeneity

A Computable Log Likelihood

1 1

u

[log | ] log ( , )

Maximize this function with respect to , , . ( )How to compute the in

The unobserved heterogeneity is averaged out of

tegral?(

l |

1

og i

i

TNit i it i ii t

i u i

E L F y f d

v

L

v

v

v x

) Analytically? No, no formula exists.(2) Approximately, using Gauss-Hermite quadrature(3) Approximately using Monte Carlo simulation

Page 27: Discrete Choice Modeling

[Part 9] 27/79

Discrete Choice Modeling

Modeling Heterogeneity

Simulation

ii

Tit i it

Nii 1

Nii 1

Ni

2i

t

i

1

i

i1

logL log d

= log d

This ]The expected value of the function of can

equal be a

-

p

F(y , )

g(

proximas log

tedby

1 exp

dra

w

)

g

in

2

g

(

E2

)

a

[

R r

x

iTN RS it u ir iti 1 r 1 t

i

ir u ir

1

r

1logL log F

ndom draws v from the population N[0,1] andaveraging the R functio

(y ,( v ) )R(We did this in part 4 for th

ns of v . We maximi

e random effects

ze

prox

bit model.)

Page 28: Discrete Choice Modeling

[Part 9] 28/79

Discrete Choice Modeling

Modeling Heterogeneity

Random Effects Model: Simulation----------------------------------------------------------------------Random Coefficients Probit ModelDependent variable DOCTOR (Quadrature Based)Log likelihood function -16296.68110 (-16290.72192) Restricted log likelihood -17701.08500Chi squared [ 1 d.f.] 2808.80780Simulation based on 50 Halton draws--------+-------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]--------+------------------------------------------------- |Nonrandom parameters AGE| .02226*** .00081 27.365 .0000 ( .02232) EDUC| -.03285*** .00391 -8.407 .0000 (-.03307) HHNINC| .00673 .05105 .132 .8952 ( .00660) |Means for random parametersConstant| -.11873** .05950 -1.995 .0460 (-.11819) |Scale parameters for dists. of random parametersConstant| .90453*** .01128 80.180 .0000--------+-------------------------------------------------------------Implied from these estimates is .904542/(1+.904532) = .449998.

Page 29: Discrete Choice Modeling

[Part 9] 29/79

Discrete Choice Modeling

Modeling Heterogeneity

The Entire Parameter Vector is Random

i i 1

1 2 ,1

1

+ , , ~ [ , ( ,..., )]

Joint probability for observations is

Prob( , ,... | ) ( )

For convenience, write = , ~ [0,1],

log | ,..

i

i it it

i K

i

Ti i i it i itt

i

ik k ik ik ik k ik

t

k

N diagT

y y u F y

u v v N v

U

L v

xu u 0

x

,1

1

. log ( )

It is not possible to maximize log | ,... because ofthe unobserved random effects embedded in .

iTNN it i iti i t

N

i

v F y

L

x

v v

Page 30: Discrete Choice Modeling

[Part 9] 30/79

Discrete Choice Modeling

Modeling Heterogeneity

Estimating the RPL ModelEstimation: 1 2it = 2 + Δzi + Γvi,t

Uncorrelated: Γ is diagonal Autocorrelated: vi,t = Rvi,t-1 + ui,t

(1) Estimate “structural parameters”(2) Estimate individual specific utility parameters(3) Estimate elasticities, etc.

Page 31: Discrete Choice Modeling

[Part 9] 31/79

Discrete Choice Modeling

Modeling Heterogeneity

Classical Estimation Platform: The Likelihood

ˆ

ˆ

i

i

i

i i iβ

i

Marginal : f( | data, )Population Mean =E[ | data, ]

= f( | )d

= = a subvector of

= Argmax L( ,i =1,...,N| data, )

Estimator =

β Ωβ Ωβ β Ω β

β Ω

Ω β Ω

βExpected value over all possible realizations of i. I.e., over all possible samples.

Page 32: Discrete Choice Modeling

[Part 9] 32/79

Discrete Choice Modeling

Modeling Heterogeneity

Simulation Based Estimation Choice probability = P[data |(1,2,Δ,Γ,R,vi,t)] Need to integrate out the unobserved random term E{P[data | (1,2,Δ,Γ,R,vi,t)]} = P[…|vi,t]f(vi,t)dvi,t

Integration is done by simulation Draw values of v and compute then probabilities Average many draws Maximize the sum of the logs of the averages (See Train[Cambridge, 2003] on simulation methods.)

v

Page 33: Discrete Choice Modeling

[Part 9] 33/79

Discrete Choice Modeling

Modeling Heterogeneity

Maximum Simulated Likelihood

i

i

i

i

Ti i i i it=1

Ti i i i i it=1β

Ni i i i ii=1 β

L ( | data ) = f(data | )

L ( | data ) = f(data | )f( | )d

logL = log L ( | data )f( | )d

β β

Ω β β Ω β

β β Ω β

True log likelihood

ˆ

N RS i iR ii=1 r=1

S

1logL = log L ( | data , )R

= argmax(logL )

β Ω

Ω

Simulated log likelihood

Page 34: Discrete Choice Modeling

[Part 9] 34/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 35: Discrete Choice Modeling

[Part 9] 35/79

Discrete Choice Modeling

Modeling Heterogeneity

When Var[ ] is not a diagonal matrix. How to estimate a positivedefinite matrix, .

~ N[ , ] = where is upper triangular

with ~ N[ , ]Convenient Refinement:

i

i

i i i

Cholesky Decomposi

u

u 0LL L

u Lv vi0o :I

t n

= ( )( ) where the diagonal elements of equal 1, and is the diagonal matrix with free positive eleme

LL MS MSM S

nts. (Cholesky values)

returns the original uncorrelated case.We used the Cholesky decomposition in developing the Krinsky and Robbmethod for standard errors for partial effects, in Part 3.

i i

u MSvM I

Page 36: Discrete Choice Modeling

[Part 9] 36/79

Discrete Choice Modeling

Modeling Heterogeneity

S

M

Page 37: Discrete Choice Modeling

[Part 9] 37/79

Discrete Choice Modeling

Modeling Heterogeneity

MSSM

Page 38: Discrete Choice Modeling

[Part 9] 38/79

Discrete Choice Modeling

Modeling Heterogeneity

Modeling Parameter Heterogeneity

i,k i k

of the parameters +

E[ | , ] of the parameters

Var[u | data] exp( )Estimation by maximum simulated likelihood

i i i

i i i

i k

β =β Δz uu X z

Individual heterogeneity in the means

Heterogeneity in the variances

Page 39: Discrete Choice Modeling

[Part 9] 39/79

Discrete Choice Modeling

Modeling Heterogeneity

A Hierarchical Probit ModelUit = 1i + 2iAgeit + 3iEducit + 4iIncomeit + it.

1i=1+11 Femalei + 12 Marriedi + u1i 2i=2+21 Femalei + 22 Marriedi + u2i 3i=3+31 Femalei + 32 Marriedi + u3i 4i=4+41 Femalei + 42 Marriedi + u4i

Yit = 1[Uit > 0]

All random variables normally distributed.

Page 40: Discrete Choice Modeling

[Part 9] 40/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 41: Discrete Choice Modeling

[Part 9] 41/79

Discrete Choice Modeling

Modeling Heterogeneity

Simulating Conditional Means for Individual Parameters

, ,1 1

,1 1

1

1 ˆ ˆˆ ˆ( ) (2 1)( )ˆ ( | , ) 1 ˆ ˆ (2 1)( )

1 ˆˆ =

i

i

TRi r it i r itr t

i i i TRit i r itr t

Rir irr

yRE

yR

WeightR

Lw Lw xy X

Lw x

Posterior estimates of E[parameters(i) | Data(i)]

Page 42: Discrete Choice Modeling

[Part 9] 42/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 43: Discrete Choice Modeling

[Part 9] 43/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 44: Discrete Choice Modeling

[Part 9] 44/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 45: Discrete Choice Modeling

[Part 9] 45/79

Discrete Choice Modeling

Modeling Heterogeneity

“Individual Coefficients”

Page 46: Discrete Choice Modeling

[Part 9] 46/79

Discrete Choice Modeling

Modeling Heterogeneity

WinBUGS: MCMC User specifies the model – constructs the Gibbs Sampler/Metropolis Hastings

MLWin: Linear and some nonlinear – logit, Poisson, etc. Uses MCMC for MLE (noninformative priors)

SAS: Proc Mixed. Classical Uses primarily a kind of GLS/GMM (method of moments algorithm for loglinear models)

Stata: Classical Several loglinear models – GLAMM. Mixing done by quadrature. Maximum simulated likelihood for multinomial choice (Arne Hole, user provided)

LIMDEP/NLOGIT Classical Mixing done by Monte Carlo integration – maximum simulated likelihood Numerous linear, nonlinear, loglinear models

Ken Train’s Gauss Code, miscellaneous freelance R and Matlab code Monte Carlo integration Mixed Logit (mixed multinomial logit) model only (but free!)

Biogeme Multinomial choice models Many experimental models (developer’s hobby)

Programs differ on the models fitted, the algorithms, the paradigm, and the extensions provided to the simplest RPM, i = +wi.

Page 47: Discrete Choice Modeling

[Part 9] 47/79

Discrete Choice Modeling

Modeling Heterogeneity

SCALING IN CHOICE MODELS

Page 48: Discrete Choice Modeling

[Part 9] 48/79

Discrete Choice Modeling

Modeling Heterogeneity

Using Degenerate Branches to Reveal Scaling

Travel

Fly Rail

Air CarTrain Bus

LIMB

BRANCH

TWIG

Drive GrndPblc

Page 49: Discrete Choice Modeling

[Part 9] 49/79

Discrete Choice Modeling

Modeling Heterogeneity

Scaling in Transport Modes-----------------------------------------------------------FIML Nested Multinomial Logit ModelDependent variable MODELog likelihood function -182.42834The model has 2 levels.Nested Logit form:IVparms=Taub|l,r,Sl|r& Fr.No normalizations imposed a prioriNumber of obs.= 210, skipped 0 obs--------+--------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]--------+-------------------------------------------------- |Attributes in the Utility Functions (beta) GC| .09622** .03875 2.483 .0130 TTME| -.08331*** .02697 -3.089 .0020 INVT| -.01888*** .00684 -2.760 .0058 INVC| -.10904*** .03677 -2.966 .0030 A_AIR| 4.50827*** 1.33062 3.388 .0007 A_TRAIN| 3.35580*** .90490 3.708 .0002 A_BUS| 3.11885** 1.33138 2.343 .0192 |IV parameters, tau(b|l,r),sigma(l|r),phi(r) FLY| 1.65512** .79212 2.089 .0367 RAIL| .92758*** .11822 7.846 .0000LOCLMASS| 1.00787*** .15131 6.661 .0000 DRIVE| 1.00000 ......(Fixed Parameter)......--------+--------------------------------------------------

NLOGIT ; Lhs=mode; Rhs=gc,ttme,invt,invc,one ; Choices=air,train,bus,car; Tree=Fly(Air), Rail(train), LoclMass(bus), Drive(Car); ivset:(drive)=[1]$

Page 50: Discrete Choice Modeling

[Part 9] 50/79

Discrete Choice Modeling

Modeling Heterogeneity

A Model with Choice Heteroscedasticity

))

],

j itj j it j i,t,j

i,t,j i,t,j

itj it i,t,j i,t,k

α + + ' + σ ε

F(ε ) = exp(-exp(-ε

IID after scaling by a choice specific scale parameterP[choice = j | , ,i, t] = Prob[U U k = 1,...,J(i,t

U(

)

i

,t, j) β'x γ z

x z

j itj j it jJ(i,t)

j itj j it jj=1

j

exp (α + + ' ) / =

exp(α + ' + ' ) /

Normalization required as only ratios can be estimated;σ =1 for one of the alternativ

β'x γ z

β x γ z

es

(Remember the integrability problem - scale is not identified.)

Page 51: Discrete Choice Modeling

[Part 9] 51/79

Discrete Choice Modeling

Modeling Heterogeneity

Heteroscedastic Extreme Value Model (1)+---------------------------------------------+| Start values obtained using MNL model || Maximum Likelihood Estimates || Log likelihood function -184.5067 || Dependent variable Choice || Response data are given as ind. choice. || Number of obs.= 210, skipped 0 bad obs. |+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+ GC | .06929537 .01743306 3.975 .0001 TTME | -.10364955 .01093815 -9.476 .0000 INVC | -.08493182 .01938251 -4.382 .0000 INVT | -.01333220 .00251698 -5.297 .0000 AASC | 5.20474275 .90521312 5.750 .0000 TASC | 4.36060457 .51066543 8.539 .0000 BASC | 3.76323447 .50625946 7.433 .0000

Page 52: Discrete Choice Modeling

[Part 9] 52/79

Discrete Choice Modeling

Modeling Heterogeneity

Heteroscedastic Extreme Value Model (2)+---------------------------------------------+| Heteroskedastic Extreme Value Model || Log likelihood function -182.4440 | (MNL logL was -184.5067)| Number of parameters 10 || Restricted log likelihood -291.1218 |+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+---------+Attributes in the Utility Functions (beta) GC | .11903513 .06402510 1.859 .0630 TTME | -.11525581 .05721397 -2.014 .0440 INVC | -.15515877 .07928045 -1.957 .0503 INVT | -.02276939 .01122762 -2.028 .0426 AASC | 4.69411460 2.48091789 1.892 .0585 TASC | 5.15629868 2.05743764 2.506 .0122 BASC | 5.03046595 1.98259353 2.537 .0112---------+Scale Parameters of Extreme Value Distns Minus 1.0 s_AIR | -.57864278 .21991837 -2.631 .0085 s_TRAIN | -.45878559 .34971034 -1.312 .1896 s_BUS | .26094835 .94582863 .276 .7826 s_CAR | .000000 ......(Fixed Parameter).......---------+Std.Dev=pi/(theta*sqr(6)) for H.E.V. distribution. s_AIR | 3.04385384 1.58867426 1.916 .0554 s_TRAIN | 2.36976283 1.53124258 1.548 .1217 s_BUS | 1.01713111 .76294300 1.333 .1825 s_CAR | 1.28254980 ......(Fixed Parameter).......

Normalized for estimation

Structural parameters

Page 53: Discrete Choice Modeling

[Part 9] 53/79

Discrete Choice Modeling

Modeling Heterogeneity

HEV Model - Elasticities+---------------------------------------------------+| Elasticity averaged over observations.|| Attribute is INVC in choice AIR || Effects on probabilities of all choices in model: || * = Direct Elasticity effect of the attribute. || Mean St.Dev || * Choice=AIR -4.2604 1.6745 || Choice=TRAIN 1.5828 1.9918 || Choice=BUS 3.2158 4.4589 || Choice=CAR 2.6644 4.0479 || Attribute is INVC in choice TRAIN || Choice=AIR .7306 .5171 || * Choice=TRAIN -3.6725 4.2167 || Choice=BUS 2.4322 2.9464 || Choice=CAR 1.6659 1.3707 || Attribute is INVC in choice BUS || Choice=AIR .3698 .5522 || Choice=TRAIN .5949 1.5410 || * Choice=BUS -6.5309 5.0374 || Choice=CAR 2.1039 8.8085 || Attribute is INVC in choice CAR || Choice=AIR .3401 .3078 || Choice=TRAIN .4681 .4794 || Choice=BUS 1.4723 1.6322 || * Choice=CAR -3.5584 9.3057 |+---------------------------------------------------+

+---------------------------+| INVC in AIR || Mean St.Dev || * -5.0216 2.3881 || 2.2191 2.6025 || 2.2191 2.6025 || 2.2191 2.6025 || INVC in TRAIN || 1.0066 .8801 || * -3.3536 2.4168 || 1.0066 .8801 || 1.0066 .8801 || INVC in BUS || .4057 .6339 || .4057 .6339 || * -2.4359 1.1237 || .4057 .6339 || INVC in CAR || .3944 .3589 || .3944 .3589 || .3944 .3589 || * -1.3888 1.2161 |+---------------------------+

Multinomial Logit

Page 54: Discrete Choice Modeling

[Part 9] 54/79

Discrete Choice Modeling

Modeling Heterogeneity

Variance Heterogeneity in MNL

= j itj j it ij i,t,j

ij j i

i,t,j i

U(i,t, j) = α + ' + ' +σ ε

σ = exp( + ). returns the HEV model

F(ε ) = exp(-exp(-ε

We extend the HEV model by allowing variances to differ across individuals

β x γ zδ w δ 0

,t,j

j

))

= 0 for one of the alternatives

Scaling now differs both across alternatives and across individuals

Page 55: Discrete Choice Modeling

[Part 9] 55/79

Discrete Choice Modeling

Modeling Heterogeneity

Application: Shoe Brand Choice

Simulated Data: Stated Choice, 400 respondents, 8 choice situations, 3,200 observations

3 choice/attributes + NONE Fashion = High / Low Quality = High / Low Price = 25/50/75,100 coded 1,2,3,4

Heterogeneity: Sex, Age (<25, 25-39, 40+) Underlying data generated by a 3 class latent

class process (100, 200, 100 in classes)

Page 56: Discrete Choice Modeling

[Part 9] 56/79

Discrete Choice Modeling

Modeling Heterogeneity

Multinomial Logit Baseline Values+---------------------------------------------+| Discrete choice (multinomial logit) model || Number of observations 3200 || Log likelihood function -4158.503 || Number of obs.= 3200, skipped 0 bad obs. |+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+ FASH | 1.47890473 .06776814 21.823 .0000 QUAL | 1.01372755 .06444532 15.730 .0000 PRICE | -11.8023376 .80406103 -14.678 .0000 ASC4 | .03679254 .07176387 .513 .6082

Page 57: Discrete Choice Modeling

[Part 9] 57/79

Discrete Choice Modeling

Modeling Heterogeneity

Multinomial Logit Elasticities+---------------------------------------------------+| Elasticity averaged over observations.|| Attribute is PRICE in choice BRAND1 || Effects on probabilities of all choices in model: || * = Direct Elasticity effect of the attribute. || Mean St.Dev || * Choice=BRAND1 -.8895 .3647 || Choice=BRAND2 .2907 .2631 || Choice=BRAND3 .2907 .2631 || Choice=NONE .2907 .2631 || Attribute is PRICE in choice BRAND2 || Choice=BRAND1 .3127 .1371 || * Choice=BRAND2 -1.2216 .3135 || Choice=BRAND3 .3127 .1371 || Choice=NONE .3127 .1371 || Attribute is PRICE in choice BRAND3 || Choice=BRAND1 .3664 .2233 || Choice=BRAND2 .3664 .2233 || * Choice=BRAND3 -.7548 .3363 || Choice=NONE .3664 .2233 |+---------------------------------------------------+

Page 58: Discrete Choice Modeling

[Part 9] 58/79

Discrete Choice Modeling

Modeling Heterogeneity

HEV Model without Heterogeneity+---------------------------------------------+| Heteroskedastic Extreme Value Model || Dependent variable CHOICE || Number of observations 3200 || Log likelihood function -4151.611 || Response data are given as ind. choice. |+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+---------+Attributes in the Utility Functions (beta) FASH | 1.57473345 .31427031 5.011 .0000 QUAL | 1.09208463 .22895113 4.770 .0000 PRICE | -13.3740754 2.61275111 -5.119 .0000 ASC4 | -.01128916 .22484607 -.050 .9600---------+Scale Parameters of Extreme Value Distns Minus 1.0 s_BRAND1| .03779175 .22077461 .171 .8641 s_BRAND2| -.12843300 .17939207 -.716 .4740 s_BRAND3| .01149458 .22724947 .051 .9597 s_NONE | .000000 ......(Fixed Parameter).......---------+Std.Dev=pi/(theta*sqr(6)) for H.E.V. distribution. s_BRAND1| 1.23584505 .26290748 4.701 .0000 s_BRAND2| 1.47154471 .30288372 4.858 .0000 s_BRAND3| 1.26797496 .28487215 4.451 .0000 s_NONE | 1.28254980 ......(Fixed Parameter).......

Essentially no differences in variances across choices

Page 59: Discrete Choice Modeling

[Part 9] 59/79

Discrete Choice Modeling

Modeling Heterogeneity

Homogeneous HEV Elasticities

+---------------------------------------------------+| Attribute is PRICE in choice BRAND1 || Mean St.Dev || * Choice=BRAND1 -1.0585 .4526 || Choice=BRAND2 .2801 .2573 || Choice=BRAND3 .3270 .3004 || Choice=NONE .3232 .2969 || Attribute is PRICE in choice BRAND2 || Choice=BRAND1 .3576 .1481 || * Choice=BRAND2 -1.2122 .3142 || Choice=BRAND3 .3466 .1426 || Choice=NONE .3429 .1411 || Attribute is PRICE in choice BRAND3 || Choice=BRAND1 .4332 .2532 || Choice=BRAND2 .3610 .2116 || * Choice=BRAND3 -.8648 .4015 || Choice=NONE .4156 .2436 |+---------------------------------------------------+| Elasticity averaged over observations.|| Effects on probabilities of all choices in model: || * = Direct Elasticity effect of the attribute. |+---------------------------------------------------+

+--------------------------+| PRICE in choice BRAND1|| Mean St.Dev || * -.8895 .3647 || .2907 .2631 || .2907 .2631 || .2907 .2631 || PRICE in choice BRAND2|| .3127 .1371 || * -1.2216 .3135 || .3127 .1371 || .3127 .1371 || PRICE in choice BRAND3|| .3664 .2233 || .3664 .2233 || * -.7548 .3363 || .3664 .2233 |+--------------------------+

Multinomial Logit

Page 60: Discrete Choice Modeling

[Part 9] 60/79

Discrete Choice Modeling

Modeling Heterogeneity

Heteroscedasticity Across Individuals+---------------------------------------------+| Heteroskedastic Extreme Value Model | Homog-HEV MNL| Log likelihood function -4129.518[10] | -4151.611[7] -4158.503[4]+---------------------------------------------++--------+--------------+----------------+--------+--------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]|+--------+--------------+----------------+--------+--------+---------+Attributes in the Utility Functions (beta) FASH | 1.01640726 .20261573 5.016 .0000 QUAL | .55668491 .11604080 4.797 .0000 PRICE | -7.44758292 1.52664112 -4.878 .0000 ASC4 | .18300524 .09678571 1.891 .0586---------+Scale Parameters of Extreme Value Distributions s_BRAND1| .81114924 .10099174 8.032 .0000 s_BRAND2| .72713522 .08931110 8.142 .0000 s_BRAND3| .80084114 .10316939 7.762 .0000 s_NONE | 1.00000000 ......(Fixed Parameter).......---------+Heterogeneity in Scales of Ext.Value Distns. MALE | .21512161 .09359521 2.298 .0215 AGE25 | .79346679 .13687581 5.797 .0000 AGE39 | .38284617 .16129109 2.374 .0176

Page 61: Discrete Choice Modeling

[Part 9] 61/79

Discrete Choice Modeling

Modeling Heterogeneity

Variance Heterogeneity Elasticities

+---------------------------------------------------+| Attribute is PRICE in choice BRAND1 || Mean St.Dev || * Choice=BRAND1 -.8978 .5162 || Choice=BRAND2 .2269 .2595 || Choice=BRAND3 .2507 .2884 || Choice=NONE .3116 .3587 || Attribute is PRICE in choice BRAND2 || Choice=BRAND1 .2853 .1776 || * Choice=BRAND2 -1.0757 .5030 || Choice=BRAND3 .2779 .1669 || Choice=NONE .3404 .2045 || Attribute is PRICE in choice BRAND3 || Choice=BRAND1 .3328 .2477 || Choice=BRAND2 .2974 .2227 || * Choice=BRAND3 -.7458 .4468 || Choice=NONE .4056 .3025 |+---------------------------------------------------+

+--------------------------+| PRICE in choice BRAND1|| Mean St.Dev || * -.8895 .3647 || .2907 .2631 || .2907 .2631 || .2907 .2631 || PRICE in choice BRAND2|| .3127 .1371 || * -1.2216 .3135 || .3127 .1371 || .3127 .1371 || PRICE in choice BRAND3|| .3664 .2233 || .3664 .2233 || * -.7548 .3363 || .3664 .2233 |+--------------------------+

Multinomial Logit

Page 62: Discrete Choice Modeling

[Part 9] 62/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 63: Discrete Choice Modeling

[Part 9] 63/79

Discrete Choice Modeling

Modeling Heterogeneity

Page 64: Discrete Choice Modeling

[Part 9] 64/79

Discrete Choice Modeling

Modeling Heterogeneity

Generalized Mixed Logit Model i i,t,j i,t,j

i i i i i i

i i

U(i,t, j) = Common effects + ε

Random Parameters= σ [ + ]+ [γ + σ (1- γ)]=

is a lower triangular matrix with 1s on the diagonal (Cholesky matrix)

β x

β β Δh Γ v Γ ΛΣ

Λ

Σ

i k k i

2i i i i i

i i

is a diagonal matrix with φ exp( )Overall preference scaling

σ = exp(-τ / 2+ τ w + ]

τ = exp( ) 0 < γ < 1

ψ h

θ hλ r

Page 65: Discrete Choice Modeling

[Part 9] 65/79

Discrete Choice Modeling

Modeling Heterogeneity

Unobserved Heterogeneity in Scaling

, ,

1

HEV formulation: U (1 / )

Generalized model with = 1 and = [ ].Produces a scaled multinomial logit model with

exp( )Prob(choice = j) = , 1,..., ,

exp( )

it

it j itj i it j

i i

i itjit J

i itjj

i N j

x

0

x

x

2

1,..., , 1,...,

The random variation in the scaling is

exp( / 2 )The variation across individuals may also be observed, so that

exp(

it i

i i

i

J t T

w

2 / 2 ) i iw z

Page 66: Discrete Choice Modeling

[Part 9] 66/79

Discrete Choice Modeling

Modeling Heterogeneity

Scaled MNL

Page 67: Discrete Choice Modeling

[Part 9] 67/79

Discrete Choice Modeling

Modeling Heterogeneity

Observed and Unobserved Heterogeneity

Page 68: Discrete Choice Modeling

[Part 9] 68/79

Discrete Choice Modeling

Modeling Heterogeneity

Price Elasticities

Page 69: Discrete Choice Modeling

[Part 9] 69/79

Discrete Choice Modeling

Modeling Heterogeneity

Scaling as Unobserved Heterogeneity

Page 70: Discrete Choice Modeling

[Part 9] 70/79

Discrete Choice Modeling

Modeling Heterogeneity

Two Way Latent Class?

s c j1J

s c altalt 1

C S s c jJc 1 s 1

s c altalt 1

exp[ ]Prob(Choice=j|class=c,risk=s)= , = 1

exp[ ]

exp[ ]Prob(Choice=j)= (c,s)

exp[ ]

xxx

x

Page 71: Discrete Choice Modeling

[Part 9] 71/79

Discrete Choice Modeling

Modeling Heterogeneity

Appendix: Maximum Simulated Likelihood

TNiti=1 t 1

K 1 K

logL( )=

log f(y | , )h( | , )d

,..., , ,..., ,

i

it i i i iβ

1

θ,Ω

x ,β θ β z Ω β

Ω=β,Δ, δ δ Γ

Page 72: Discrete Choice Modeling

[Part 9] 72/79

Discrete Choice Modeling

Modeling Heterogeneity

Monte Carlo Integration

range of v

(1) Integral is of the form K = g(v|data, ) f(v| ) dvwhere f(v) is the density of random variable vpossibly conditioned on a set of parameters and g(v|data, ) is a function of data and

β Ω

Ωβ

r

parameters.(2) By construction, K( ) = E[g(v|data, )](3) Strategy: a. Sample R values from the population of v using a random number generator. b. Compute average K = (1/R) g(v |dat

Ω β

Rr=1 a, )

By the law of large numbers, plim K = K. β

Page 73: Discrete Choice Modeling

[Part 9] 73/79

Discrete Choice Modeling

Modeling Heterogeneity

Monte Carlo Integration

ii

RP

ir i i i u iur=1

1 f(u ) f(u )g(u )du =E [f(u )]R(Certain smoothness conditions must be met.)

ir

ir ir ir

-1 2ir ir

Drawing u by 'random sampling'u = t(v ), v ~ U[0,1]

E.g., u = σΦ (v )+μ for N[μ,σ ]Requires many draws, typically hundreds or thousands

Page 74: Discrete Choice Modeling

[Part 9] 74/79

Discrete Choice Modeling

Modeling Heterogeneity

Example: Monte Carlo Integral

2

1 2 3

1 2 3

exp( v / 2)(x .9v) (x .9v) (x .9v) dv2

where is the standard normal CDF and x = .5, x = -.2, x = .3.The weighting function for v is the standard normal.Strategy: Draw R (say 1000) standard

r

1 r 2 r 3 r

normal random draws, v . Compute the 1000 functions

(x .9v ) (x .9v ) (x .9v ) and average them.(Based on 100, 1000, 10000, I get .28746, .28437, .27242)

Page 75: Discrete Choice Modeling

[Part 9] 75/79

Discrete Choice Modeling

Modeling Heterogeneity

Simulated Log Likelihood for a Mixed Probit Model

i

i

it it

TNiti=1 t 1

TN RSiti=1 r 1 t 1

Random parameters probit modelf(y | ) [(2y 1) ]

~N[ , ]LogL( )= log [(2y 1) ] N[ , ]d

1LogL log [(2y 1) ( )]RWe now m

i

it i it i

i i2

i

2it i iβ

it ir

x ,β x ββ β+uu 0 ΓΛ Γ'

β,Γ,Λ x β β ΓΛ Γ' β

x β+ΓΛv

aximize this function with respect to ( ).β,Γ,Λ

Page 76: Discrete Choice Modeling

[Part 9] 76/79

Discrete Choice Modeling

Modeling Heterogeneity

Generating Random DrawsMost common approach is the "inverse probability transform"Let u = a random draw from the standard uniform (0,1).Let x = the desired population to draw fromAssume the CDF of x is F(x).The random -1 draw is then x = F (u).Example: exponential, . f(x)= exp(- x), F(x)=1-exp(- x) Equate u to F(x), x = -(1/ )log(1-u).Example: Normal( , ). Inverse function does not exist in

closed form. There are good polynomial approxi-

mations to produce a draw from N[0,1] from a U(0,1). Then x = + v.This leaves the question of how to draw the U(0,1).

Page 77: Discrete Choice Modeling

[Part 9] 77/79

Discrete Choice Modeling

Modeling Heterogeneity

Drawing Uniform Random NumbersComputer generated random numbers are not random; theyare Markov chains that look random.The Original IBM SSP Random Number Generator for 32 bit computers.SEED originates at some large odd number d3 = 2147483647.0 d2 = 2147483655.0 d1=16807.0SEED=Mod(d1*SEED,d3) ! MOD(a,p) = a - INT(a/p) * pX=SEED/d2 is a pseudo-random value between 0 and 1.Problems: (1) Short period. Base 31d on 32 bits, so recycles after 2 1 values(2) Evidently not very close to random. (Recent tests have discredited this RNG)

Page 78: Discrete Choice Modeling

[Part 9] 78/79

Discrete Choice Modeling

Modeling Heterogeneity

Quasi-Monte Carlo Integration Based on Halton Sequences

0

Coverage of the unit interval is the objective,not randomness of the set of draws.Halton sequences --- Markov chainp = a prime number, r= the sequence of integers, decomposed as H(r|p)

I iiib p

b 10

, ,...1 r = r (e.g., 10,11,12,...) I i

iip

For example, using base p=5, the integer r=37 has b0 = 2, b1 = 2, and b2 = 1; (37=1x52 + 2x51 + 2x50). Then H(37|5) = 25-1 + 25-2 + 15-3 = 0.448.

Page 79: Discrete Choice Modeling

[Part 9] 79/79

Discrete Choice Modeling

Modeling Heterogeneity

Halton Sequences vs. Random Draws

Requires far fewer draws – for one dimension, about 1/10. Accelerates estimation by a factor of 5 to 10.