Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9...

56
Phd in Transportation / Transport Demand Modelling 1 Phd Program in Transportation Transport Demand Modeling Gonçalo Homem de Almeida Correia Session 1 Multinomial Logit

Transcript of Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9...

Page 1: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 1

Phd Program in Transportation

Transport Demand Modeling

Gonçalo Homem de Almeida Correia

Session 1

Multinomial Logit

Page 2: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 2

Outline of the Module on Discrete Choice

Models (3 sessions)

Our objectives for the next three sessions:

� The Multinomial Logit (MNL)

� Building Stated Preference Experiments

� Nested Logit (NL) model - correlation among alternatives

� Mixed Logit (ML) model – accounting for user preference heterogeneity

Page 3: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 3

Standing on the Shoulder of Great

Researchers

Just to cite some names (there are more!):

� Moshe Ben-Akiva (USA)

� David Hensher (Australia)

� Daniel McFadden (USA) (Nobel Prize winner on Economics in 2000)

� Juan de Dios Ortúzar (Chile)

� Kenneth Train (USA)

Page 4: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 4

The “Utility” of Discrete Choice Models

(DCM’s)

� They are called discrete because they deal with the modeling of discrete choices, we either choose one option or the other, you can’t choose both.

� In Transportation some of its applications include, but are not limited to:

� Modeling the mode choice of travelers for different trip purposes, activity locations, or socio-demographic characteristics of the decision maker;

� Modeling home and office location in function of price and other site and decision maker characteristics;

� Modeling the decision to follow or not to follow an advise given by a Variable Message Sign on a Freeway;

� The decision to enter or not to enter in a roundabout as a function of the socio-demographic profile of the decision maker, or the type of vehicle.

Page 5: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 5

The Multinomial Logit (MNL) (I)� Discrete choice models are based on the theory of stochastic utility whereby a choice is made by a decision maker in order to maximize a utility function. This utility function is constructed as a combination of known explanatory variables, the systematic part of utility, and a random part which is unknown (Ben-Akiva and Lerman 1985):

��� = ��� + ���

� ��� represents the systematic part of utility given by the decision maker � to alternative �, and is generally considered as a linear-in-parameters of the form: ��� =�0� + ′��� . where ��� is a vector of attribute values for alternative � as viewed by decision maker �, ′ is a vector of weighs, and �0� is a constant that is specific to each alternative �. ��� represents the error between the systematic part of utility and the true utility given by user � to alternative �, you can view this error as the part of the utility which is unknown to the analyst (this is the most widely accepted theory on the

random part of utility).

Page 6: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 6

� Assumptions on the statistical distribution of the error component lead to different discrete choice models.

� Logit is derived on the assumption that the error terms are Extreme Value Distributed with constant mean and a scale parameter, i.e., their distribution is identical, and independent amongst alternatives. The distribution is also called Gumbel and type I extreme value.

The Multinomial Logit (MNL) (II)

� The variance of this distribution is ���, where � is the scale parameter.

� The mean is a value which is the same for all errors as the distributions are identically distributed, but as we will see the mean is irrelevant because only differences in utility matter for changing the probability of a choice.

Page 7: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 7

� The distribution of each individual error implies that any difference in

error terms �� = ��� − ��� is logistically distributed (Ben-Akiva and Lerman 1985). Under this assumption we have that the CDF (Cumulative

Distribution Function) of �� is given by:

�(�� ) = 11 + �−� ��

and the Probability Density Function (PDF) is given by:

�(�� ) = ��−� ��(1 + �−� �� )2

The Multinomial Logit (MNL) (III)

Page 8: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 8

The Multinomial Logit (MNL) (IV)

�� (�) = ���� ��!�"#(��� ≥ ��� ) =

= ���� ��!�"#(��� + ��� ≥ ��� + ��� ) =

= ���� ��!�"#(��� − ��� ≥ ��� − ��� ) =

= ���� ��!�"#(�� ≤ ��� − ��� ) =

= �(��� − ��� ) = 11 + �−�&��� −��� ' = �����

�� ��� + �����

�Thus for computing the probability of choosing an alternative against another (binary choice) we have the following:

�When calibrating a model we will have estimates of the β coefficientswhich multiply for each explanatory variable, thus we have:

�� (�) = �(��� − ��� ) = ��′ ���

�� ′ ��� + �� ′ ��� Impossible to

differentiate both

coefficients and the

scale parameter

Page 9: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 9

The Multinomial Logit (MNL) (V)

�Since we cannot distinguish both we may arbitrate � = 1, thus we have �() *+, = �() *-, = ./

012/ =./

0

�This allows simplifying the logit model and the expression for the utility of an alternative which is now the following:

�, � = ���� ��!�"# �+, $ �-, =�345

�345 + �365� This model can be expanded for multiple choices producing a

Multinomial Logit (MNL) model expressed in the following way (Ben-Akiva and Lerman, 1985):

�, � = ���� ��!�"# �+, $ �-, =�345

∑ �365-895Where :, is the choice set that the decision maker � faces. Ex: : �, <=>, ?� ��, …

Page 10: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 10

� Logit assumes that the error component of the alternatives are IID (Independent and Identically Distributed - homoscedacity) thus the matrix of variance-covariance of the Utilities of the alternatives is the following:

:�A[�] =

DEEEEEEEEFG

2

6 0 0 0 0

0G2

6 0 0 00 0 … … …

0 0 …G2

6 0

0 0 … 0G2

6 IJJJJJJJJK

Variances of the Utilities are only the result of the error terms interactions and not the systematic part of Utility which is not probabilistic

:�A(�� , �� /� ≠ �) = 0

Covariance of different utilities is 0 because the error terms are independent. Two independent variables have no covariance.

The Multinomial Logit (MNL) (VI)

Be careful: Hensher, Rose and Green (2005) normalize the variances and not the scale parameter, thus this matrix would have a diagonal just with ones, and the scale would have to be computed accordingly: � =

G√6

Page 11: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 11

� The IID property together with the Extreme Value Type I distribution for the errors imply the Logit’s Independence of Irrelevant Alternatives (IIA) assumption in which the ratio of two alternatives stays constant no matter what happens to the other alternatives:

��(�)��(�)

=

����∑ ��O�O∈:�

����∑ ��O�O∈:�

=��������

�This is a strong assumption in some cases. Let’s consider the famous case of the blue and the red buses:

The Multinomial Logit (MNL) (VII)

Page 12: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 12

�If in the choice between two modes of transportation: bus and automobile, the utilities of both modes are the same thus the probability of choice is 50% for each one which leads to a ratio between both alternatives of 1.

�Consider now that another alternative mode is introduced: a blue bus, which does not show any advantages comparing with the old buses (red buses). We now have three alternatives and the only way the ratio can be maintained is if the probability of the three alternatives is 33.333% (ratio=0.33/0.33=1).

�But this is illogical. The question we have to ask is if the blue bus and the red Bus are really two different alternatives. Is there any difference on the color of the bus in its attractiveness? The most logical outcome would be the automobile having a 50% modal share, and each of the Bus modes having 25% of the preferences which means that adding another alternative should change the ratio to 0.5/0.25=2 (IIA not respected).

The MNL model should not be applied to highly correlated alternatives i.e.

alternatives who share part of their utility!

The Multinomial Logit (MNL) (VIII)

Page 13: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 13

The Multinomial Logit (MNL) (IX)�Probability of choosing one alternative as a function of the difference of utilities

�For the binary choice we have: B

A

j

B

j

A

V

V

j

V

V

j

V

V

B

A

e

e

e

e

e

e

P

P==

BA

B

AVV

B

A

V

V

B

A

V

V

B

A VVP

Pee

P

P

e

e

P

P

e

e

P

PBA

B

A

B

A

−=

=−=

=

⇒= ln)ln()ln(lnlnln

BA

A

A VVP

P−=

−1ln

�Q − �R

�Q

Page 14: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 14

Utility functions specification (I)

� The utility functions specification is essential in discrete choice modeling. A decision on the choice model to use is not just deciding among Logit, Probit, etc … is deciding on how to build the utility functions which translate better the decision makers behavior when facing a decision amongst a choice set of alternatives Cn.

� In general we will distinguish between two types of explanatory variables:

� SDC variables – Socio-Demographic Characteristics of the individual; and

� Alternative attributes;

� Their only difference is that the latter vary among alternatives and the first don’t.

Page 15: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 15

� While the attributes of the alternatives can be part of the utility functions of every alternative, in the case of SDC variables this is not possible as the model would not be identifiable –meaning that one of the coefficients would not be estimable.

� Consider the following example for the two functions of the two modes: Car and Bus:

Utility functions specification (II)

�(: �) = �: � + �: � ?? × ??S � + �: � ?: × ?:S � + �: � (T� × (T�

�(<=>) = �<=> + �<=>?? × ??<=> + �<=> ?: × ?:<=> + �<=>(T� × (T�

� We are trying to understand the effect of travel time, travel cost and decision makers’ age in mode choice. In addition to that, we add a so called alternative specific constant coefficient to try to capture the mean unknown component of utility which is not being explained by the other variables.

Page 16: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 16

� This model cannot be estimated, and we will prove it. According to the IIA we have as we seen:

�� (: � )�� (<=> ) =

��: � �∑ ��O�O∈:�

��<=> �∑ ��O�O∈:�

= � � : � �� �<=> � =>

!� V�� (: � )�� (<=> )W = !� X� �: � �

� �<=> � Y =>

!� V�� (: � )�� (<=> )W = !�(��: � � ) − !�(��<=> � ) =>

!� V�� (: � )�� (<=> )W = �: �� − �<=> �

Utility functions specification (III)

Page 17: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 17

�: �� − �<=> �= �: � + �: � ?? × ??S � + �: � ?: × ?:S � + �: � (T� × (T� − V�<=> + �<=> ?? × ??<=> + �<=> ?: × ?:<=> + �<=> (T� × (T�W

= �: � − �<=> + �: � ?? × ??S � − �<=> ?? × ??<=> + �: � ?: × ?:S �− �<=> ?: × ?:<=> + (�: � (T� − �<=> (T� ) × (T�

Utility functions specification (IV)

� We cannot identify both coefficients in the two situations: one of them must be normalized.

� Imagine this as a multivariate regression: you cannot have an intersection with two components (� 9Z[ and � R\]), it is not possible to differentiate that because there would be infinite combinations which result in the same independent effect on utility.

Page 18: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 18

� The way we avoid this is by considering a reference alternative to which utilities are measured against:

�(: �) = �: � + �: � ?? × ??S � + �: � ?: × ?:S � + �: � (T� × (T�

�(<=>) = �<=> ?? × ??<=> + �<=>?: × ?:<=>

Utility functions specification (V)

This measures how the Age impacts on choosing Car in relation to the Bus alternative.

This is the alternative specific constant and it will tell us if there is a preference for car compared to Bus which is not explained by the variables we have selected

The reference alternative!

� The choice of which alternative to use as a reference is the analyst decision. Coefficients will change, however, modeled probabilities will be the same.

Page 19: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 19

Non-Linear effects of attributes (I)

� The utility functions of DCM’s are usually linear-in parameters, which is not the same as to say that the attributes can’t be changed in order to have a non-linear effect on the Utility.

� Many times we use the logarithm of time and cost variables:

� In this perspective the same variation of an attribute has a higher impact for lower values than for higher values. This often happens with time and cost variables.

∆_ ∆_

Page 20: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 20

� The Box-Cox transformation is another alternative. It is given by:

If λ ≠0 then Box(x) = (x λ-1)/ λ

If λ =0 then Box(x)=ln(x)

Non-Linear effects of attributes (II)

λ

� λ is not estimated, it must be chosen by the analyst;

Page 21: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 21

Estimating the Multinomial Logit (MNL)

� The estimation of DCMs is best done by maximum likelihood, using the likelihood function:

` _(a∗) = c c �� (�)#���∈:�

d

�=1

where #�� denotes the binary variable which is equal to 1 when respondent � chooses alternative �, and 0 otherwise.

� Using the logarithm of the Likelihood simplifies the maximization of the Likelihood function, and this does not change the solution:

` _(a) = e e #�� × !�(�� (�)� )�∈:�

d

�=1

(<1) Function of the parameters we want to estimate

Negative values because it is less than 1

(<<0)

Page 22: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 22

The Willingness to Pay (WTP)

� The willingness to pay is a key result from discrete choice models, it tells us how much people are willing to pay for a certain benefit.

� Typically in transportation, and specifically in mode choice we are interested for instance in how much people are willing to pay for a time saving, this is known as the Value of Travel Time Savings (VTTS); or for instance, how much people are willing to pay to have their number of transfers reduced in a public transportation trip.

� This can be easily computed through the weighs (coefficients) of the variables in the utility functions because utility does not have a scale and is compensatory. Let’s see the example for the VTTS:

�(: �) = �: � + �: � ?? × ??S � + �: � ?: × ?:S � + �: � (T� × (T�

�??f[€/h��] = �: � ?? [1/h��]�: � ?: [1/€] [€/h��]

Page 23: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 23

Forecasting – Aggregating Across

Alternatives (I)

� One of the advantages of using DCMs is to be able to forecast an expected demand or the utilization of a given service or infra-structure.

� This can be done using several techniques and has to obey certain rules in order to produce accurate results on the predictions we are searching for.

� The maximum Utility model that we specified is directed to the problem of predicting individual behavior, we used the subscript � in the utility equations in order to represent the Utility that user � gives to an alternative � and this value depends on variables related to the choice and also to the individual.

� However to predict the choice for a single individual is not very useful for problems in transportation and other engineering fields, typically we are estimating such model in order to help decide on an investment or on a planning move.

Page 24: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 24

� “Instead, most real world decisions are based (at least in part) on the forecast of some aggregate demand, such as the number of trips of various types made in total by some population or the amount of freight shipped between different city pairs” (Ben-Akiva and Lerman, 1985).

Forecasting – Aggregating Across

Alternatives (II)

� The first step in making aggregate forecasts based on disaggregate models is to define the population of interest.

� Sometimes this is easy to find, but in other times we might be interested in sub populations, based on any number of characteristics, e.g. income level or geographic area.

Page 25: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 25

Forecasting – Aggregating Across

Alternatives (III)

d?(�) = e �(�|j� )d?

�=1

� Assuming that the size of the population is known, we will designate it as T (Ben-

Akiva and Lerman, 1985) we may know how to address the problem of aggregating

across individuals in order to produce a share of the individuals who choose each

alternative. The deduction is straight forward: we denote NT as the number of

decision makers in T.

� We may write the probability that an individual n in T chooses an alternative i as

�(�|j� ), where j� is defined as the vector of all the attributes affecting the choice that appears in the DCM, regardless of which utility function they appear in, being

attributes of the alternatives or socio-demographic attributes. Hence if we knew the

values for all those attributes, to compute an aggregate forecast for the expected

number of individuals in T choosing any alternative i, denoted by d?(�), would simply be:

Page 26: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 26

� This is an expected value, thus, the true number of persons choosing i is a random variable. “In most real world forecasting situations, T is large enough so that the distinction between the actual share of the population using i and its expected value is negligible” (Ben-Akiva and Lerman, 1985).

� Another form of that equation is the one that estimates not the total number of choosers, but the share of persons choosing an alternative:

k(�) = 1dl

e � � j,mn

,o2= p � � j

� It is like a weighted probability of choosing an alternative.

Forecasting – Aggregating Across

Alternatives (IV)

Page 27: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 27

� The simplicity of both expressions hides the fact that most of the times having the vector of all choice-relevant attributes for the population of interest is very unrealistic.

� Moreover the value of P(i|Xn) is never fully known, only an estimate is available because the underlying parameters are unknown.

� For solving these two main problems there are several methods pointed to reduce data needs but at the expense of the accuracy of the estimates. The objective should be to maximize the accuracy and at the same time minimizing the cost of the forecasting process.

� Ben-Akiva and Lerman present them in detail in their Discrete Choice Analysis handbook (1985). We will concentrate in the two simplest models:

Forecasting – Aggregating Across

Alternatives (V)

Page 28: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 28

Forecasting – Aggregating Across

Alternatives (VI)

� The first and simplest model is the average individual procedure; the method builds a “representative individual” using his characteristics to represent the entire population. We define jq as the average attributes of the population and approximate k(�) as � � jq . However this is demonstrated to produce significant errors in the estimates, and off course, you should be able to produce an average for each explanatory variable in the utility functions, which can be very hard to obtain.

� The alternative for this aggregation method is using sample enumeration. This method uses a random sample of the population as “representative” of the entire population. The predicted share of the sample choosing alternative � is used as an estimate for k(�):

Page 29: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 29

where d> is the number of individuals in the sample.

kr (�) = 1d>

e �(�|j� )d

�=1

Forecasting – Aggregating Across

Alternatives (VII)

� The problem of this approach is the underlying hypothesis of having a

perfect random sample. The technique may be applied when the sample

is drawn non-randomly from the population but this has to be done

recurring to stratification, know the groups that result from this

stratification, apply the sample enumeration to each group and then

compute an estimate of k(�) as the weighted sum of the within-class forecasts.

Page 30: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 30

Forecasting – Aggregating Across

Alternatives (VIII)

� Usually we take the same sample used for calibrating the model coefficients for aggregating across alternatives.

� This technique solves the problem of not having the explanatory variables of the entire population to use for the forecast; however it introduces great dependence on the sample quality.

� This is even more important when we are using Stated Preference data to calibrate discrete choice models. As you will see in the next session, the alternatives in such experiments are synthetic so they are not revealed in reality and as such using sample enumeration produces unrealistic shares for each alternative.

Page 31: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 31

The asymptotic t Test (I)

� The asymptotic t Test, also known as the Wald-Statistic test, is used

to test the hypothesis that a parameter is equal to a pre-specified value.

The most generally used value is zero because this allows testing the

hypothesis that the weights on the utility function are null, which is the

same as to test if the corresponding variable is relevant or not for the

discrete model under analysis. This test is only valid asymptotically, i.e.

only for large samples, given that the variance-covariance matrix of the

parameters is estimated asymptotically.

Page 32: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 32

The test statistic for the relevance of a parameter is the following:

H0: βi = 0 H1: βi ≠ 0 Z = βi −0

uvar (βi )

The hypothesis test for the equality of two coefficients:

H0: βi = βj

H1: βi ≠ βj

Z = βi −βjzvar (βi −βj )

The asymptotic t Test (II)

Comparing this statistic with a significance level, one can

reject or not reject the hypothesis of both coefficients being

equal. This is very important, because if we are not able to

reject the hypothesis of two coefficients being equal it is

reasonable to estimate a new model enforcing the

constraint: βi = βj, resulting in one less parameter to estimate.

Page 33: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 33

rejection

� The critical values for the test statistic are percentiles of a

standardized normal distribution, which for two-tailed tests at the most

used levels of significance of 20%, 10% and 5% are ±1.281552, ±1.644854 and ±1.95996 respectively.

The asymptotic t Test (III)

Page 34: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 34

The likelihood ratio test (I)

�Under the null hypothesis that all the coefficients are zero, the statistic −2&a(0) −a(∗)', with a(0) being the log likelihood of a model with all coefficients equal to zero (equal shares amongst the alternatives) and a(∗) being a model with all coefficients to be estimated, is asymptotically �2(Chi-Squared) distributed with Knumber degrees of freedom equal to the number of coefficients. we call this test the

LL ratio-test because the difference between the logs of two values is

mathematically equivalent to the log of the ratio of the two values

�A more interesting test statistic compares not the fully restricted model but a model

constituted only for alternative-specific constants because this information can be

extracted from the sample. The statistic −2&a(S) − a(∗)' is also asymptotically �2 distributed with K-J+1 degrees of freedom, where K is the total number ofparameters to be estimated, J is the number of alternatives in the choice set and

a(S) is the likelihood of a model only with constants. A model only with ASC is the best model we are able to reach without any external information to the sample

proportions.

Page 35: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 35

� This test can be applied for comparing different models. The

general test statistic is −2&L(R) − L(U)', where L(R) is the likelihood of a restricted model and L(U) is the likelihood of an unrestricted model. This statistic is asymptotically �2 distributed with �� − �) degrees of freedom, where �� and �) are the number of estimated coefficients in both models. Good applications for this test

are, for example, testing the quality of two utility specifications: one

simply run the two models, compute the statistic and compare it with

the level of significance.

The likelihood ratio test (II)

�2�

Page 36: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 36

� When estimating more than one alternative model specification it is useful to

analyze the goodness of fit of each model and comparing them. Usually a

higher value of the Likelihood function is considered better.

� But It is more convenient to compare the value of the likelihood index:

�2 = 1 − a(∗)a(S). This is an informal goodness-of-fit index that measures the

fraction of an initial log likelihood value explained by the model. �2 is analogous to )2 used in regression, but it should be used with somewhat caution.

Goodness of fit - The pseudo-R2(I)

Page 37: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 37

� The pseudo-R2 cannot be viewed as the R2 in linear regression. The mapping of both produces a non-linear relationship, where apparently not so good values of the pseudo-R2 would correspond to fair or good values of the R2.

Goodness of fit - The pseudo-R2(II)

(Domencich and MacFadden, 1975)

Page 38: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 38

� However, as in the case with the R2, this measure will always increase with

the number of variables in the model, or at least stay the same value. For

this reason an adjusted measure is used to take this in consideration:

�̅2 = 1 − a(∗)−�a(0) just as the adjusted )2 in linear regression.

with �( ��=>"h��" � S"��) = �(� � h�"��>)

Goodness of fit - The pseudo-R2(III)

Page 39: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 39

� The folowing is the expression that gives us the elasticity of demand in relation to one of the explanatory variables:

� p�4�5�45 =

��45�45��4�5�4�5= ��45

��4�5 × �4�5�45

� This is interpreted as the elasticity of the probability of alternative � for decision maker � with respect to a marginal change (1%) in the O��attribute of the ��� alternative (i.e. j+��), as observed by decision maker �.

� Because elasticity is specific to each decision maker, the way we have to aggregate the elasticity across the population is the following:

� p�6�5�4 = ∑ ��45��6�5

�45�5��∑ ��45�5��

Elasticity (I)

Page 40: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 40

� Relationship between elasticity of demand of an alternative � in relation to the price of that alternative (j+��), and expected revenue change if there is a price increase or a price decrease on the alternative.

Elasticity (II)

In this table � is used as the index for decision maker

�, as used throughout this class

Page 41: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 41

A Modal Choice Revealed Preference Survey in

Australia - Lab

� This survey is part of the Book “Applied Choice Analysis: A Primer” by David Hensher, et al (2005). They have a Revealed and a Stated Preference Survey (will be explained in the next session).

� We start just with the Revealed Preference (RP) survey for understanding how to estimate MNL models in NLOGIT (Econometric Software Inc.).

� Each respondent in the sample answered a questionnaire about the trip they had the day before of the survey.

� For this Lab we have filtered the data, selecting just some of the available explanatory variables.

Page 42: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 42

Variable Meaning

ALTIJ Alternative Number: 1 = DRIVE ALONE, 2 = RIDE SHARE, 3 = BUS, 4 = TRAIN, 5 = WALK, 6 = BICYCLE;

CHOICE 1, chosen, 0 not chosen

CSET Number of alternatives in each comparison. In this case there are always 2.

MPTRFARE Cost of public transport ($AUS)

MPTRTIME Time on public transport (min)

WAITTIME Time waiting for public transport (min)

AUTOTIME Time inside the automobile (min)

VEHPRKCT Cost of parking in destination ($AUS)

VEHTOLCT Paid toll ($AUS)

NUMBVEHS (SDC) Number of vehicles in household

WKROCCUP (SDC) Occupation category:1 = Managers and Admin, 2 = Professionals, 3 = Para-professional, 4 = Tradespersons, 5 = Clerks, 6 = Sales, 7 = Plant operators, 8 = Laborers , 9 = Other

PERAGE (SDC) Age

A Modal Choice Revealed Preference Survey in

Australia - Lab

Page 43: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 43

� Variables ALTIJ, CHOICE and CSET, allow building the dependent variable of the DCM.

� ALTIJ will identify the mode of transportation that each line of data represents, CHOICE will tell which of the lines (modes) has been chosen from the Choice set, and finally CSET will tell how many lines (alternatives) are in each choice which respondents have answered.

� In this RP survey each choice is made between the mode of transportation that the user has selected the day before and in the questionnaire they were also asked to give attribute levels of a single alternative means of traveling to work as perceived by that respondent. This second mode was deemed the alternative mode.

A Modal Choice Revealed Preference Survey in

Australia - LabVariable (continue) Meaning

DISDWCBD (like an SDC) Distance to the Central Business District (CBD) (km)

TRIPTIME Trip time in Bicycle or walking (min)

Page 44: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 44

Data Sample

(…)

Page 45: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 45

Building the Utility Functions for an MNL

The first step on running an MNL is thinking of an initial structure for the Utility functions. My proposal is the following:

Drive alone Utility:

U(DA) =ASDR+TDRDA*AUTOTIME+COST*VEHPRKCT+COST*VEHTOLCT+VEHD*NUMBVEHS+AGE*PERAGE+MANAGE*MANAGERS/Ride Share Utility:

U(RS) = ASRS +COST*VEHPRKCT+COST*VEHTOLCT/Bus Utility:

U(BUS) = ASBU +COST*MPTRFARE + TPTBUS*MPTRTIME+TW*WAITTIME/Train Utility:

U(TRAIN) = ASTR + COST*MPTRFARE + TPTTRA*MPTRTIME+TW*WAITTIME+DISTAN*DISDWCBDWalk Utility:

U(WALK) = ASWA+TRPEDBI*TRIPTIME/Cycle Utility:

U(CYCLE) = TRPEDBI*TRIPTIME

New Variable!

Reference

Alternative

Page 46: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 46

MNL’s in Nlogit

� The command in Nlogit to create MNL models is the DISCRETECHOICE (or NLOGIT) command. Go to Model->Discrete Choice->Discrete Choice.

Instead of having the choice we could have a frequency of choices, which Nlogit transforms in probabilities.

If we want to give a higher weigh to some choices, for instance, due to sample stratification

Page 47: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 47

MNL’s in NlogitIt does not allow to specify different weighs for the same alternative attribute. For instant: time usually has a different weigh for different alternatives

Best Option! We have all the freedom we want

Page 48: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 48

The code in Nlogit

Will produce a new variable called “Prob” with the probability of the alternative being chosen in its choice set

Cross tabulation of true versus predicted choices

Verifies the data

Describes all utility functions and their coefficients estimates

Creates the new variable according to an existing one. We can’t introduce a categorical variable directly in the model.

�Run the model by selecting all text and pressing go!

Page 49: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 49

The output

It is the log likelihood ratio considering a base model of equal choice shares (50% -50%), it does not use the alternative specific constants model, this must be computed directly. It is almost impossible that we do not reject the hypothesis of this being a better model than one with equal shares

It is the Log Likelihood

Individuals, not data lines

Page 50: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 50

� NLOGIT;Lhs=CHOICE,CSET,ALTIJ;Choices=DA,RS,BUS,TRAIN,WALK,CYCLE;Rh2=ONE$

A model only with ASCs

−2(a(S) − a(∗)) =

−2(−392.51 − (−344.1845)) =

= 96.651

degrees of fredom=15-5=10

In Excel:=INV.CHI(0.05,10)=18.307

18.307

We reject the hypothesis that our model is the same as a base model with just the alternative specific constants.

Pseudo R2 (McFadden)= 1-(-344.1845/-392.51)=0.1231

5%

96,651(…)

Page 51: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 51

Crosstab of Actual vs Predicted

DA RS BUS TRAIN WALK CYCLE Total

DA 446 19 17 11 3 3 499

RS 11 79 16 21 3 4 135

BUS 20 15 53 7 0 0 95

TRAIN 15 18 9 33 2 0 78

WALK 3 2 0 2 9 4 20

CYCLE 4 2 0 4 3 15 27

Total 499 135 95 78 20 27 854

�The predicted choices are obtained by computing the probabilities and if � � > � � ∀ �* � we say that alternative � is chosen.

% :����S" �����S"���> = 635854 = 74% Good prediction ability.

Page 52: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 52

MNL Coefficients results

Number of vehicles is very positive for choosing to drive alone

Time walking and bicycling is very negative for those alternatives

As an experiment: Try considering one alternative specific coefficient for the travel time in bicycle and another for the time walking … see what happens. Determine the value of driving time alone.

Not being able to explain part of the disutility of these modes against the Bicycle

Page 53: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 53

Wald StatisticTest if we can reject the hypothesis of the cost parameter being zero:

WALD;FN1:COST-0$

This results in exactly the same value as in the previous table. Because we are testing the same hypothesis.

Test if we can reject the hypothesis of the travel time inside the bus and the

time waiting for the bus have the same weight in the Utility function:

WALD;FN1:TPTBUS-TW$

We reject the hypothesis that both are the same

Page 54: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 54

Aggregating across Individuals� With the Prob variable we may aggregate across individuals and obtain

the modal shares using Excel:

� See that the model is reproducing the shares in the sample. This must happen always.

Page 55: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 55

Elasticity� Add the following to the MNL instructions:

You get:

Driving Alone demand is inelastic regarding parking price: Means that we could increase the price of parking and still get an increase on revenue, despite the decrease on demand. But always remember that elasticity is point dependent.

(…)

� = 0.67%

Page 56: Multinomial Logit - ULisboa class 12_11.pdf · Phd in Transportation / Transport Demand Modelling 9 The Multinomial Logit (MNL) (V) Since we cannot distinguish both we may arbitrate

Phd in Transportation / Transport Demand Modelling 56

Bibliography� Ben-Akiva, Moshe and Lerman, Steven R (1985) “Discrete Choice Analysis:

Theory and Applications to Travel Demand”, MIT Press.

� Hensher, Rose and Greene (2005) “Applied Choice Analysis: A Primer” Cambridge.

� Correia G. (2009) PhD Thesis: Carpooling and Carpool Clubs Clarifying Concepts and Assessing Value Enhancement Possibilities. URL: http://dl.dropbox.com/u/791022/Carpooling%20and%20Carpool%20Clubs%20Clarifying%20Concepts%20and%20Assessing%20Value%20Enhancement%20Possibilities.pdf

� Ortúzar J. and Willumsen L. (2001) Modelling Transport. 3rd Edition. John Wiley and Sons. West Sussex, England.

� Train K. (2002) Discrete Choice Methods with Simulation. Cambridge University Press. Free to be access through: http://elsa.berkeley.edu/books/choice2.html

Case study data is free to be accessed from: http://www.cambridge.org/0521605776

(some changes were made to the data)