Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted...

35
Censoring Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes

Transcript of Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted...

Page 1: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Censoring

Limited Dependent Variable Models

ECON 6002Econometrics Memorial University of Newfoundland

Adapted from Vera Tabakova’s notes

Page 2: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Slide16-2Principles of Econometrics, 3rd Edition

Censoring, Truncation, sample selection and related models

We now consider two closely related models:

• regression when the dependent variable of interest is incompletely observed (due to censoring or truncation)

• regression when the dependent variable is completely observed but is observed in a selected sample that is not representative of the population

Page 3: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Slide16-3Principles of Econometrics, 3rd Edition

Censoring, Truncation, sample selection and related models

OLS regression yields inconsistentestimates because the sample is not representative of the population

The first-generation estimation methods require strong distributional assumptions and even seemingly minor departures from those assumptions, such as heteroskedasticity, can lead to inconsistency

Page 4: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7 Limited Dependent Variables

16.7.1 Censored Data

Figure 16.3 Histogram of Wife’s Hours of Work in 1975

Slide16-4Principles of Econometrics, 3rd Edition

Page 5: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.1 Censored Data

Having censored data means that a substantial fraction of the

observations on the dependent variable take a limit value. The

regression function is no longer given by (16.30).

The least squares estimators of the regression parameters obtained by

running a regression of y on x are biased and inconsistent—least

squares estimation fails.

Slide16-5Principles of Econometrics, 3rd Edition

(16.30) 1 2|E y x x

Page 6: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.1 Censored Data

Having censored data means that a substantial fraction of the

observations on the dependent variable take a limit value. The

regression function is no longer given by (16.30).

The least squares estimators of the regression parameters obtained by

running a regression of y on x are biased and inconsistent—least

squares estimation fails.

Slide16-6Principles of Econometrics, 3rd Edition

(16.30) 1 2|E y x x

Page 7: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Censoring versus Truncation

Censoring occurs when some of the observations of the dependent variable have been recorded as having reached a limit value regardless of what their actual value might be

For instance, anyone earning $1 million or more per year might be recorded in your dataset at the upper limit of $1 million

Page 8: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Censoring versus Truncation

With truncation, we only observe the value of the regressors when the dependent variable takes a certain value (usually a positive one instead of zero)

With censoring we observe in principle the value of the regressors for everyone, but not the value of the dependent variable for those whose dependent variable takes a value beyond the limit

Page 9: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.2 A Monte Carlo Experiment

We give the parameters the specific values and

Assume

Slide16-9Principles of Econometrics, 3rd Edition

(16.31)

1 29 and 1.

*1 2 9i i i i iy x e x e

2~ 0, 16 .ie N

*

* *

0 if 0;

if 0.

i i

i i i

y y

y y y

Page 10: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.2 A Monte Carlo Experiment

Create N = 200 random values of xi that are spread evenly (or

uniformly) over the interval [0, 20]. These we will keep fixed in

further simulations.

Obtain N = 200 random values ei from a normal distribution with

mean 0 and variance 16.

Create N = 200 values of the latent variable.

Obtain N = 200 values of the observed yi using

Slide16-10Principles of Econometrics, 3rd Edition

*

* *

0 if 0

if 0

i

i

i i

yy

y y

Page 11: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.2 A Monte Carlo Experiment

Figure 16.4 Uncensored Sample Data and Regression Function

Slide16-11Principles of Econometrics, 3rd Edition

Page 12: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.2 A Monte Carlo Experiment

Figure 16.5 Censored Sample Data, and Latent Regression Function and

Least Squares Fitted Line

Slide16-12Principles of Econometrics, 3rd Edition

Page 13: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.2 A Monte Carlo Experiment

Slide16-13Principles of Econometrics, 3rd Edition

(16.32a)ˆ 2.1477 .5161

(se) (.3706) (.0326)i iy x

(16.32b)ˆ 3.1399 .6388

(se) (1.2055) (.0827)i iy x

(16.33) ( )1

1 NSAM

MC k k mm

E b bNSAM

Page 14: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.3 Maximum Likelihood Estimation

The maximum likelihood procedure is called Tobit in honor of James

Tobin, winner of the 1981 Nobel Prize in Economics, who first

studied this model.

The probit probability that yi = 0 is:

Slide16-14Principles of Econometrics, 3rd Edition

1 20 [ 0] 1i i iP y P y x

1

221 2 21 2 1 22

0 0

1, , 1 2 exp

2i i

ii i

y y

xL y x

Page 15: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.3 Maximum Likelihood Estimation

The maximum likelihood estimator is consistent and asymptotically

normal, with a known covariance matrix.

Using the artificial data the fitted values are:

Slide16-15Principles of Econometrics, 3rd Edition

(16.34)10.2773 1.0487

(se) (1.0970) (.0790)i iy x

Page 16: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.3 Maximum Likelihood Estimation

Slide16-16Principles of Econometrics, 3rd Edition

Page 17: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.4 Tobit Model Interpretation

Because the cdf values are positive, the sign of the coefficient does

tell the direction of the marginal effect, just not its magnitude. If

β2 > 0, as x increases the cdf function approaches 1, and the slope of

the regression function approaches that of the latent variable model.

Slide16-17Principles of Econometrics, 3rd Edition

(16.35) 1 2

2

|E y x x

x

Page 18: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.4 Tobit Model Interpretation

Figure 16.6 Censored Sample Data, and Regression Functions for Observed and Positive y values

Slide16-18Principles of Econometrics, 3rd Edition

Uncensored meanTruncated meanCensored mean

Page 19: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.5 An Example

Slide16-19Principles of Econometrics, 3rd Edition

(16.36)1 2 3 4 4 6HOURS EDUC EXPER AGE KIDSL e

2 73.29 .3638 26.34

E HOURS

EDUC

26.66

Marginal effect on the observed hours while 73.29 is the effect on the underlying “unconditional” hours*

*NB: in all cases the expectation is conditional on the values of the regressors, so do not get confused by the terminology here

Page 20: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

16.7.5 An Example

Slide16-20Principles of Econometrics, 3rd Edition

Page 21: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Postestimation and interpretation

Slide16-21Principles of Econometrics, 3rd Edition

• Estimating the model by OLS with the zero observations in the model would reduce all of the slope coefficients substantially

• Eliminating the zero observations as in the OLS regression just shown even reverses the sign of the effect of years of schooling (though it is a non-significant effect)

• For only women in the labor force, more schooling has no effect on hours worked

• If you consider the entire population of women, however, more schooling does increase hours, but we can now see that it is likely by encouraging more women into the labor force, not by encouraging those already in the market to work more hours

Page 22: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

STATA commands that help you with the complex marginal effects calculations in this chapter see:

Slide16-22Principles of Econometrics, 3rd Edition

http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intregThere are several marginal effects of potential interest after -tobit-:

- the marginal effect on the expected value of the latent dependent variable (on E(y*), simply given by the Tobit estimate)

- the marginal effect on the expected value of the dependent variable conditional on its being greater than the lower limit (on E(y|x, y>0)=E(y*|x, y>0))

- the marginal effect on the expected value of the observed (that is zeros included) dependent variable (on E(y|x), given by Expression 16.35)

- the marginal effect on the probability of the dependent variable exceeding the lower limit

Page 23: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

STATA commands that help you with the complex marginal effects calculations in this chapter see:

Slide16-23Principles of Econometrics, 3rd Edition

http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg

By default Stata chooses the effect on the latent variable option, which are exactly the same as the coefficients estimated by -tobit-. You will have to specify the -predict()- option in -mfx- to get the other marginal effects. Seehelp mfx- help tobit postestimation-

Page 24: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

STATA commands that help you with the complex marginal effects calculations in this chapter see:

Slide16-24Principles of Econometrics, 3rd Edition

http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg

- the marginal effect on the expected value of the latent dependent variable (on E(y*), simply given by the Tobit estimate)

- the marginal effect on the expected value of the dependent variable conditional* on its being uncensored, that is, greater than the lower limit (on E(y|x, y>0)=E(y*|x, y>0))

mfx compute, predict(e(0,.))mfx compute, predict(e(a,b))

- *NB: in all cases the expectation is conditional on the values of the regressors, so do not get confused by the terminology here

Page 25: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

STATA commands that help you with the complex marginal effects calculations in this chapter see:

Slide16-25Principles of Econometrics, 3rd Edition

http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg

- the marginal effect on the expected value of the observed (that is, zeros included) dependent variable (on E(y|x), given by Expression 16.35)

mfx compute, predict(ys(0,.)) mfx compute, predict(ys(a,b))

- the marginal effect on the probability of the dependent variable exceeding the lower limit

- mfx compute, predict(p(0,1))- mfx compute, predict(p(a,b))

Page 26: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

Interval data are data recorded in intervals rather than as a continuous variable

Survey data are often collected in this way to make it easier for the respondent and to provide some greater anonymity in responses to more personal question such as income and age

Income is often reported in intervals of $10,000 and then topcoded at a figure like $100,000 or $130,000

In contingent valuation studies, sometimes a questions to elicit willingness to pay ask respondents to choose an interval

Such data are then censored at multiple points, with the observed data y being only the particular interval in which the unobserved y lies∗

Slide16-26Principles of Econometrics, 3rd Edition

Page 27: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

Interval data are data recorded in intervals rather than as a continuous variable

In these cases you have a multi-censored dependent variable

Slide16-27Principles of Econometrics, 3rd Edition

Page 28: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

Interval data are data recorded in intervals rather than as a continuous variable

STATA’s intreg will help with this model

Slide16-28Principles of Econometrics, 3rd Edition

Page 29: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

Interval data are data recorded in intervals rather than as a continuous variable

In contingent valuation studies, sometimes a double-bound dichotomous-choice questions to elicit willingness to pay

In these cases you have a doubly-censored dependent variable with two variable limits

STATA’s intreg will help with this model

Slide16-29Principles of Econometrics, 3rd Edition

Page 30: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

Interval data are data recorded in intervals rather than as a continuous variable

You are probably guessing that another (less flexible) way to model these cases is by using an ordered regression model

The ordered probit in particular would be quite close to the interval regression model

Slide16-30Principles of Econometrics, 3rd Edition

Page 31: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

Interval data are data recorded in intervals rather than as a continuous variable

STATA’s intreg will help with this model

Example: http://www.ats.ucla.edu/stat/stata/dae/intreg.htm

Slide16-31Principles of Econometrics, 3rd Edition

Page 32: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Interval regression

STATA’s intreg will help with this model

intreg depvar1 depvar2 [indepvars] [if] [in] [weight] [, options]

By choosing the depvar1 depvar2 smartly you can also fit other models:

Type of data depvar1 depvar2 ---------------------------------------------- point data a = [a,a] a a interval data [a,b] a b left-censored data (-inf,b] . b right-censored data [a,inf) a . ----------------------------------------------

Slide16-32Principles of Econometrics, 3rd Edition

Page 33: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Keywords

Slide 16-33Principles of Econometrics, 3rd Edition

binary choice models censored data latent variables likelihood function limited dependent variables log-likelihood function marginal effect maximum likelihood estimation multinomial choice models ordered choice models ordered probit ordinal variables probit tobit model truncated data

Page 34: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Further models

Survival analysis (time-to-event data analysis)

Page 35: Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

References

Hoffmann, 2004 for all topics Long, S. and J. Freese for all topics

Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.