Binary Choice Models

1

Binary Choice Models

2

Topic Overview

• Introduction to binary choice models

• The Linear Probability model (LPM)

• The Probit model

• The Logit model

3

Introduction• In some cases the outcome of interest (Y) is not quantitative,

but a binary decision: – Go to college or not – Adopt a technology or not – Join the union or not

• For example, how well do an individual’s socioeconomic characteristics explain his/her decision to join a trade union?

• Often such models are used to model decisions: to invest or not, to enter a market or not, to hire or not, to adopt a technology or not…

• Binary variables as dependent variables (Y) complicate the estimation process

4

Introduction

• Only suitable where we can plausibly narrow down the decision alternatives to two.

• Qualitative models where the choice is between two discrete, mutually exclusive and jointly exhaustive alternatives.

• Y, the dependent variable is these models is binary or dichotomous; it can only takes on the values of 0 or 1.

• Also known as ‘rational choice’ models – as Y often represents a rational choice between two alternatives. The Xs are the factors that are expected to contribute to the selection of one outcome over another.

5

An Example

• The decisions of farmers to adopt to the latest technology:Yi = β1 + β2 Xi +...+ ui

• where Yi is a binary variable, representing two choices, e.g. to adopt (Y=1) the latest technology or not to adopt (Y=0)

• The decision is influenced by economic , structural, farm and farmer characteristics

• For example costs, farm size, age of the farmer, access to credit etc.

• So, we might for instance find that age of the farmer negatively affects the probability of adoption, while farm size has a positive effect

6

Alternative Models

• There are several ways to estimate a binary choice model:

1. The Linear Probability Model (LPM)

2. The Probit Model

3. The Logit Model

7

The Linear Probability Model

• Linear regression model with binary dependent variable Yi = 1+ 2Xi + 3Xi +…+ ui

• The conditional expectations E(Yi|Xi) can be interpreted as the

conditional probability that the event (Yi) will occur given Xi:E(Y|X1, X2,…, Xk) = P(Y=1|X1 , X2,…, Xk)

• E(Yi|Xi) might express the probability of purchasing a durable

good (e.g. a car) for a given level of Xi (e.g. income).

• Estimated with OLS

8

The Linear Probability Model • The conditional expectation of the model can be interpreted as the

conditional probability of Yi, or:

E(Yi|Xi) = β1 + β2 Xi = Pi

[ui is omitted since we have assumed that E(ui)=0 ]

• Pi = probability that Yi = 1 and (1-Pi) = probability Yi =0

• Yi follows what is known as the Bernoulli probability distribution:

• The fact that Yi is a probabilistic term imposes a very important restriction in the values it can take: 0 ≤ E(Yi|Xi) ≤ 1

9

An Example of the LPM

• Estimate the determinants of trade union membership (variable union)

• normal OLS regression with union as our dependent variable;

• In Stata: regress union exp sex

10

Stata Output and Interpretation

• The above can be interpreted as follows:• “the slope coefficient measures the change in the average value of the

regressand for a unit change in the value of a regressor, with all other variables held constant”

• In this case, holding other variables constant, an increase by one unit in exp (on-the-job experience) increases the probability of union membership by 0.004

12

Problems: LPM

• Simple model but there are important shortcomings:

1. Non-normality of the disturbances

2. Heteroskedasticity

3. Nonsense probabilities

4. Implausibility of linearity

13

Non-normality of the disturbances• In the LPM the disturbances ui are:

ui = Yi - β1 - β2 Xi • Just like Yi, ui also takes only two values.• This makes the assumption of normality in the distribution of

ui (necessary for inference) unattainable.• In fact the probability distribution of ui in the LPM is:

• Possible to overcome by central limit theorem

14

Heteroscedasticity

• The (binomial) Bernoulli probability distribution implies by definition a non-constant variance

• Specifically the variance would be:

var(ui)= Pi(1-Pi)• Since the expected probability of an event happening varies for

each case, then we can no longer assume a constant variance

Pi= E(Yi|Xi) = β1 + β2 Xi

• Usual remedial measures may be employed to correct for heteroscedasticity (e.g. WLS)

15

Nonsense Probabilities• Due to its probabilistic nature:

0 ≤ Yi ≤ 1 • In practice though, OLS

estimates of Yi may be more than 1 or less than 0.

• We can still ‘constrain’ those estimates to the desired boundaries, but the adjustment is not very good.

If some of the estimated Ŷs are less than 0 (that is, negative), Ŷi is assumed to be zero for those cases; if they are greater than 1, they are assumed to be 1.

16

Implausible Linearity

• The LPM assumes a linear relationship between the levels of the X variable(s) and the probability that Y=1.

• This linearity (or constant effect of X on Y) is very implausible.• Consider the case of a family’s decision to own a house –

would the probability be the same for all levels of income? • It is more plausible to expect that the probability is

progressively higher or lower for different levels of income…

All this indicates that the LPM is probably not a very good model.

Probit and logit models offer significant advantages and should be preferred

17

Probit and Logit Models

• Alternative models that are less problematic are the probit and the logit model

– The relationship between Pi and Xi is non-linear– As Xi increases, the conditional probability of an event

occurring Pr(Yi=1|Xi) increases but never steps outside the 0-1 interval

– Due to this built-in non-linearity both use an alternative estimator to OLS; the Maximum-Likelihood (ML) method

18

Probit and Logit Models

• Cumulative distribution function (s-shaped).

• Normal distribution – probit or logistic distribution – logit.• Unlike the linear probability model the predicted probabilities

are between 0 and 1.

19

The Probit Model• The probit model can be derived from an underlying latent

variable model that satisfies the classical linear assumptions• The outcome decision depends on an unobservable utility

index:

Ii = β1 +β2 Xi + ui

• For example, decision Y to own a house (Y=1) or not (Y=0) depends on an unobservable utility index Ii, that is determined by Xi (e.g. income, number of children)

• The larger the value of Ii the greater the probability of Y =1 (e.g. owning a house)

20

The Probit Model• The latent (unoberservable) variable Ii is linked to the

observed decision Yi by:

• If a person’s utility index I exceeds the threshold level I* (here assumed to be 0) Y=1, and if not, then Y=0

• It is assumed that the error term u is independent of X and follows a standard normal distribution

• The error is symmetrically distributed about 0, which means that 1-F(-z) = F(z)

00

01

i

ii I

Iifif

Y

21

The Probit Model• Hence the normal distribution allows us to compute the

probability that Y=1

• With F being the standard normal cumulative distribution function (CDF)

• This ensures that the probability is strictly between 0 and 1

)()(0)()0()|1(

2121

21

iii

ii

XFXuPuXPIPXYP

IZ

dZeIIF2

21

21)()(

22

The Normal CDF

• That is, in the probit model, Pi the conditional probability that Yi=1 (given Xi), follows the normal CDF.

• So if we plot the probabilities that Yi=1 for different (given) X values cumulatively we get:

Pi

Xi

0 + ∞-∞

23

Stata Output

24

Interpreting the Results: Probit

• Interpreting the slope coefficients of the probit model is complicated

• Marginal effects:

where is the probability density function of the standard normal variable and Zi = β1 +β2 Xi +...+ βk Xi

• The sign of the marginal effect is the same as βi • The magnitude of the change depends on the magnitude of βi

and the value Zi

• All X variables are involved in computing the changes in probability

• Marginal effects vary for different levels of X; it is customary to estimate them at the mean of the variables.

)( ii Z)( iZ

25


Pi

Xi

0 + ∞-∞

…it follows that the marginal effects of X on Y, vary for different levels of X

Low marginal effects at extreme values of X, high marginal effects at central values.

26

Stata Output

.0064 x (-1.12cons+0.006exp+(-.54sex)+(-.33sth)+.01age)

27


• If X2 is a binary variable the marginal effect from changing from 0 to 1 is

• Again, this depends on all values of the other explanatory variables

)0()1( 2121 XFXF

28

The Logit Model• The logit model is similar to the probit model – the key

difference is that it is based on the logistic CDF rather than the normal CDF.

• If the utility index exceeds thethreshold level I*, Y=1, otherwise Y=0

• Assuming F to be a logistic CDF

• Where

i

i

i Z

Z

Z ee

eIF

11

1)(

iii XZ 2

00

01

i

ii I

Iifif

Y

)()(0)()0()|1(

2121

21

iii

ii

XFXuPuXPIPXYP

29

The Logistic CDF

0

Pi

Xi+ ∞- ∞

1

30

Interpreting the Results: Logit

• The ratio of the two probabilities is the odds ratio in favor of the outcome:

• The logit model produces easily communicable odd ratios of the marginal effects of a single unit’s increase in each independent variable on the probability of Y=1.

• The ratio P/(1-P) is the odds ratio in favour of owning a house – ratio of the probability that a family will own a house to the probability that it will not own a house

11 1

ii

i

ZZi

Zi

P e eP e

31

Interpreting the Results: Logit

• Marginal effects can be calculated in the same way as for the probit model

• Also possible to calculate the odds ratios

• odds ratio = eβ

• where e (the natural logarithm) equals approximately 2.71828

• If eβ is greater than 1, the odds are eβ times larger

• If eβ is less than 1, the odds are eβ times smaller

• Positive effects are greater than 1, while negative effects are between 0 and 1

32

Stata Output

eβ 2.71828 -9625674= 0.3819

“holding other regressors constant, women (sex=1) are approximately 3.8 times less likely to be a member of union”

33

Estimation: Probit and Logit

• Estimation using OLS is not possible due to non-linearity not only in the variables but also in the parameters (the betas).

• Maximum Likelihood is the suitable method: it involves maximising a likelihood function in such a way that the resulting betas take those values that maximise the probability of observing the given Y’s.

• For the precise mechanics (see GUJ Appendix 15A.p. 633).• In practice software (in our case Stata) does all the hard

work for us:• Command Syntax in stata: probit /logit <Y variable> <X variables>

• e.g. probit union exp sex

34

Stata Output

35

Inference: Probit and Logit • Likelihood ratio (LR) statistic: – Tests the null hypothesis that all β coefficients are zero

(equivalent to the F-test in the linear regression model).

– The LR statistics follows the chi-square distribution (χ2) with df equal to the number of explanatory variables (constant not included), e.g. LR chi2(3) = 27.55

• Wald-statistic– Tests the null hypothesis that β=0 (equivalent to t-statistic)

– inferences are based on the normal table (if sample is large, t-distribution converges to the normal distribution)

• Stata provides exact p values that the null hypothesis is true for both tests.

36

Stata Output

37

Goodness of Fit: Probit and Logit • Conventional R2 is not very meaningful in probit or logit

models.• Many alternative measures have been proposed, the most

widely used are the Count R2 and McFadden R2.• Count R2:

• McFadden R2 (Pseudo R2): Calculated as

• Expected signs and significance of coefficients is important

0loglog1

LL

38

Example in Stata

After estimating either a probit or a logit, type fitstat to obtain Goodness-of-Fit statistics:

39

Probit or Logit?

• Respective CDFs are almost identical:

40

Probit and Logit• The two models can be used interchangeably: there are no

good theoretical reasons to prefer one over the other.• Their results should be qualitatively identical; i.e. we should

get the same coefficient signs regardless of whether we use probit or logit.

• “ … if you multiply the probit coefficient by about 1.81 (which is approximately = π/√3), you will get approximately the logit coefficient (…) Alternatively, if you multiply a logit coefficient by 0.55 (= 1/1.81), you will get the probit coefficient”

• Sometimes the logit is preferred due to the easy interpretation of its coefficients through odds ratios

• Sometimes the probit is preferred due to its normal distribution assumption

• You can begin by running a logit, perform the tests and to test for robustness also try a probit – then compare the output.

41

Example: Binary Choice Discussion Group Membership Probit Logit Coefficients Marginal

effects Coefficients Marginal

effects Odds ratio

BMW -.393*(.221)

-.1228 -.627*(.377)

-.114 .5340

SW -.516**(.237)

-.155 -.861**(.406)

-.1504 .4227

East -.449**(.209)

-.1404 -.762**(.355)

-.1395 .4665

Herd size .017***(.003)

.0056 .0284***(.005)

.00568 1.028

LU/ha .219(.189)

.0734 .3402(.333)

.0679 1.405

Age -.008**(.008)

-.0028 -.015**(.013)

-.0030 .9848

job -.353(.243)

-.109 -.621 (.441)

-.1111 .5374

cons -1.21**(.559)

-.621**(.946)

LR chi2(7) 72.48 LR chi2(7) 71.52Pseudo R2 0.1810 Pseudo R2 0.1786

Binary Choice Models

Documents

Transcript of Binary Choice Models