Poirier 1980 Journal of Eco No Metrics

download Poirier 1980 Journal of Eco No Metrics

of 9

Transcript of Poirier 1980 Journal of Eco No Metrics

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    1/9

    Journal of Econometrics 12 (1980) 2099217. 0 North-Holland Publishing Company

    PARTIAL OBSERVABILITY IN BIVARIATEPROBIT MODELS

    Dale J. POIRIER*Uniwzrsity of Toronto, Toronto, Otlt. M5S 1 Al, Canudu

    Received August 1978, final version received February 1979

    This study investigates random utility models m which the observed binary outcome does notreflect the binary choice of a single decision-maker, but rather the joint unobserved bmarychoices of two decision-makers. Under the usual normality assumptions, the model that arisesfor the observed binary outcome is not a univariate probit model, but rather a bivariate probitmodel in which only one of the four possible outcomes is observed. Estimation and identifi-cation issues are discussed, and the impltcations for sample selectivity problems are noted.

    Often it is desirable to provide a utility maximizing rationalization for binarychoice problems where the observed binary outcome does not reflect thebinary choice of a single decision-maker, but rather the binary joint choicesof two decision-makers. One example is provided by Gunderson (1974) whodiscussed alternative statistical models for estimating the probability that anon-the-job trainee will be retained by the sponsoring company after training.In this situation the employer must decide whether or not to make a joboffer, and the applicant must decide whether or not to seek a job offer. Eachindividuals decision is not observed, but rather only whether or not thetrainee continues working after the completion of training. In such a caseunder the usual normality assumptions the correct choice of distribution willnot be a univariate probit model (even if the two decisions are statisticallyindependent).

    To see this consider two individuals (j= 1,2) each faced with a binarychoice yj=O, 1. Let wjm be a fixed vector of characteristics of individual j andchoice yj= m where WI=O, 1. Suppose the two individuals have utility

    *The author wishes to acknowledge the generous financial support of the University ofToronto which aided in the completion of this study. Thanks are also owed to TakeshiAmemiya, Cheng Hsiao, Roger Klein, Roger Koenker, Paul Ruud, Adonis Yatchew, and threeanonymous referees for their valuable comments on preliminary drafts. Any errors in this tinalversion are of course the sole responsibility of the author. An earlier version of this paperappeared as Working Paper No. 7802 of the Institute of Policy Analysis, University of Toronto.

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    2/9

    210 D.J . Pokier, Observability in biruriu te probit m odels

    functions of the form

    Ulm=glm(Wlm,LT)+YIIm, m=o, 1.Uzm = g2m(~Z,n>YT) + VZrn> m = 0, 1 ,

    where for j= 1,2. gj,(.,.) is a non-stochastic scaledisturbance term measuring unobserved attributes,

    LT= uj, - Uj2, j=1,2,

    function, qj,,, is a randomand the utility differential

    represents individual js sentiment toward yj = 1. Further suppose

    gl,(~\,,,~T)-g10(~10.~T)=i)14T+X6~,g,,(w,,,~T)-gzo(zo,4T)=Y2r),T +x62,

    rlll -Irlo=&l> v21 -rl20=&23

    where x is a k-dimensional row vector of explanatory variables, 6, and 6,are k-dimensional column vectors of unknown coefficients, 7, and y, areunknown parameters, and e 3 [aI : 2 ] ' N( 0 , Q) with

    Then it is easy to show that

    and that according to the random utility maximization hypothesis, individualj will select

    yj=l iff yj*>O, i.e., Cj, > Uj,

    The preceding specification permits (but does not require) interdependencybetween the utility functions of the two individuals in the sense that theutility of each individual is a function of the sentiment of the otherindividual, and in the sense that the two random components sr and s2 may

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    3/9

    D.J. Poirier, Observability in biuariate probit models 211

    be correlated. Structural equations (1) and (2) are a special case of the modelsuggested by Heckman (1976, 1978).

    The reduced form equations corresponding to (1) and (2) are

    where

    y:=xp,+u,, (3)J$=xP,+u,, (4)

    P1 =(d, -r1W(I -Y1?J2), PZ =(& -Y281)l(I -Y1Y2),v1= (El +YI~2)/(1-YlY2)~ u2 = t&2 +w1YU -Y1Y2).

    Thus the reduced forms for the individual decisions correspond to univariateprobit models, and the reduced form of the two decisions taken togethercorresponds to a bivariate probit model.

    Given a random sample (yii, yZilxi) (i = 1,2,. . ., m), estimation of the para-meters of structural equations (1) and (2) based on reduced form equations(3) and (4), has been discussed by Heckman (1976, 1978) and Amemiya(1978). Provided that the usual identification conditions hold (e.g., eachstructural equation excludes at least one exogenous variable appearing in theother structural equation) and subject to a normalization rule such asvar(u,)=var(u,)=l or wll=oZ2= 1, all structural coefficients are identified.

    In some instances, however, the choices y1 and yz will be only partiallyobserved. Specifically, consider the case where the only information on thetwo dichotomies is whether or not both equal unity. The study of Gunderson(1974) is one example. Another example of such a case is a two membercommittee voting anonymously under a unanimity rule. An outsider onlyobserves whether a motion passes (i.e., both members vote aye) or whetherit fails (i.e., at least one member votes nay). Partial observability of this typecan be represented by the single binary random variable

    i=YliY2i, i=l,2 ,..., n. (5)Since zi= 1 iff yIi=yzi= 1, the distribution of zi is given by

    pi=Pr(zi=l)=Pr(yii=l and 42i=I)=F(XiP1,XiP2;P),I-pi=Pr(zi=O)=Pr(p,i=O or .vzi=O)=1-F(xiB1,XiB2;P).

    An alternative specification of (I) and (2) would introduce the actual decision of oneindividual into the other individuals structural equation. While such a specificatton has itsmerits (it seems attractive if there is a temporal ordering of the decisions so that, say, individual1 makes a deciston conditional on the decision of individual 2) it will not be considered heresince its reduced form does not necessarily reduce to the usual bivariate probit model, and asthe title of this study suggests, the bivariate probit model is the principal concern here. SeeHeckman (1976, 1978) for a thorough discussion of this alternative.

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    4/9

    212 D.J. Poirier, Observabi lit y i n binariate probit m odels

    where the variances of u, and u2 have been normalized to equal unity, p isthe correlation between ~i and L~, and F(.,. ;.) denotes the bivariatestandard normal distribution. The log-likelihood function of the sample is

    L(P1,B23P)= : Zi1nCF(XiB1,Xilj2;P)1i= I+(1-Zi)1n[1-~(xi~~,Xi~2;P)1~ (6)

    Letting O= [pi, /I;, p], the information matrix corresponding to (6) can beexpressed as

    (7)where C is n x (2k + 1) matrix with ith row equalling

    cj= (p,(l -p,)) -+ I~(ui)~(Ai)Xjr~(bi)~(Bi)Xi,f(ai,bi,P)l,and where Cp( ) and f(. , . ; .) denote, respectively, the univariate and bivariatestandard normal densities, @(. ) denotes the univariate standard normaldistribution, and

    ui=xiP*, Ai= (1 -p2)-+(bi-pq),bi=xibz, B,=(l -p2)mf(ai-pbi).

    The consequences of partial observability of the type described by (5) areessentially two-fold. First, the maximum likelihood estimators obtained from(6) will be inefhcient compared to those obtained in the case of fullyobserved choices. Unfortunately, quantifying the efficiency lost is not possiblewithout reference to a particular data set. Second, identification problemsarise which require careful examination.

    To see these problems, consider the following. The parameter vector B issaid to be globuIIy ident$ed if there is no other vector 0 which isobservationally equivalent to it, i.e., if there does not exist g#O such that I!?implies the same probability distribution for zi as does 0. If such is true foran open neighborhood around 0, but not necessarily outside this neigh-borhood, then H is said to be locul/): identjfied. Rothenberg (1971, Theorem 1)shows that 0 will be locally identified if and only if information matrix (7) isnon-singular, or equivalently, if and only if the rank of C is 2k+ 1. In thepresence of constraints of the form

    $i(O) =O, i=1,2 ,..., L,

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    5/9

    D.J . Pokier, Observability in bivariat e m odels 213

    where each tii is a known function possessing continuous partial derivatives,Rothenberg (1971, Theorem 2) shows that (3 is locally identifiable, if and onlyif

    (8)

    has rank 2k + 1, where c?$/c% is a K x (2k + 1) matrix with the element inrow i and column j equalling ~3rC/,/d0,.~

    In applying these results to the present context it is useful to distinguishbetween two cases: pi =p2 and fli #P2. If fii =p2, then information matrix(7) is singular, but the augmented matrix in (8) equals

    where I, is a k x k identity matrix and 0, is a k x 1 zero vector, and thisaugmented matrix has rank equal to 2k-t 1. Thus provided that theconstraints are taken into account, 0 will be identified.

    If p, #P2, then information matrix (7) will be non-singular except inpeculiar cases such as described later, and hence, except in such peculiarcases, 0 will be locally identified. The parameter vector 8 is not, in general,globally identified since it is possible to interchange pi and /I2 and obtain anobservationally equivalent model. For global identification it is necessary tobe able to distinguish at least one of the elements in /Ii, from itscorresponding element in /12. This is essentially a labelling problem re-miniscent of a mixture of normals problem, and it can be overcome bynumerous linear, non-linear, or inequality restrictions. In particular if thestructural latent variable model given by (1) and (2) is over-identified in theusual sense, then this will imply restrictions on the reduced form parameters/3i and p2, and thus pi and /I2 will be globally identified.

    The peculiar cases referred to earlier, are somewhat problem specific, butin general involve specific exogenous variable configurations. For example,consider the case in which k = 2, xi = [xii, xi2], xi1 = 1, and xi2 is a dummyvariable. Order the observations so that xi2 =0 (i= 1,2,. . ., r) and xi2 = 1 (i=r+ 1, r+2,..., n). Letting fll =[flii,~iJ and fiz=[p21,/j22], suppose that

    Both theorems cited here also require certain regularity conditions to hold. For eqample, 8must be a regular point of (7) or (8) in the sense that there exists an open neighbourhoodaround 0 in which these matrices have constant rank. See Rothenberg (1971) for a detaileddiscussion.

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    6/9

    214 D.J. Pokier, Obseroabilit~ in biuariate probit modelsthe following two restrictions are known to be valid: PI2 =0 and p = 0. Thenthe augmented matrix in (8) can be written as

    S=RR,where _ ,______R= _______c,______! 110000 0 0 0 1where C, is a r x 5 matrix in which the elements in row i are given by

    and where C2 is a (n - r) x 5 matrix in which the elements in row i are givenby

    ci2 =cil)

    Despite the addition of the two restrictions, the parameter vector 0 is notlocally identified since Rd =0 where d is a 5 x 1 vector with elements

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    7/9

    D.J. Pokier, Observabi lit y in biuariat e probit m odels 215

    d, =4(B21)W21)d,=Q,d,= _4(p11)Wll)d4= _4(Bll) [

    4(~21Pu?21+B22)_IWll) @(P21k#4B21P22) 1

    d5 = O.If, however, xi2 took on three distinct values rather than two, then 0 wouldbe locally identified. Of course if the additional restrictions were dropped,say pi2 #O, then further variations in xi1 would be required for localidentilication.

    In summary then, identification in partially observed bivariate probitmodels is a somewhat tricky problem. It appears that in general all one cando is check to see whether the reduced form parameters are locally identifiedaccording to whether C in (7) or S in (8) have ranks equalling 2k + 1. Even ifno restrictions are available, /3 may be locally identified provided theexogenous variables exhibit sufficient variation over the sample, although thepreviously mentioned labelling problem will arise. On the other hand, as thepeculiar case considered earlier suggests, even when restrictions are avail-able, 8 may not be locally identified if the exogenous variables do not exhibitsufficient variation over the sample. The intuition gained from the examplesuggests that the exogenous variables must take on at least as many distinctdata configurations as there are unknown parameters.3 When Xi2 took ononly two values, the three unknown parameters were not identified, but whenX 12 took on three distinct values, all parameters were identified. If theexogenous variables can take on a continuum of values, then no peculiaridentification problem will arise. The inherent non-linearity of (7) and (8) interms of both parameters and exogenous variables is what provides the localidentification.

    If the reduced form parameters are locally identified when only z isobserved, then the structural parameters in (1) and (2) will be locallyidentified provided the usual simultaneous equation identification tests(treating y: and J$ as observed) are satisfied. Estimation in such cases canproceed in the fashion described by Heckman (1976, 1978) and Amemiya(1978).

    Finally, the results here have implications not only for estimation ofrandom utility models involving binary outcomes depending on binary

    3Mathematical confirmation of this intuition has been provided by Takeshi Amemiya

    l.Econ E

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    8/9

    216 D.J. Poirier, Observabil it y in bivariate probit models

    choices of more than one decision-maker, but they also have importantimplications for sample selectivity problems [see Heckman (1979)]. To seethis consider once again the job trainee model of Gunderson (1974) andsuppose one adds an earnings equation,

    where y3 denotes earnings. Suppose that v=[v,, c~,UJnormal distribution with mean zero and covariance matrix.

    (9)has a trivariate

    Then given that one only observes earnings for those trainees who continueto work with the firm after termination of the training program, theconditional mean of the disturbance term in the earnings equation will begiven by

    E(c,14T>0,4T>O)=E(u31u,>xfi1,u,> -x/?J2)=a23 i

    ~(XP1)~(X(lj2-PP1)I(1-P2)~)FbD,TxP2;P) I

    +a23[

    4(xP2)@b(P1 -PB2)/(1 -PY)WPl~XP2~P) I

    . (10)

    Given consistent estimators of the reduced form parameters, consistentestimators of the bracketed expressions in (10) may be obtained, and thesemay be added to (9) as sample selectivity regressors which asymptoticallypurge the earnings equation of its selectivity bias (assuming al3 # O ora23 #O). What is important to note here is that if z had been modelledincorrectly as a univariate probit model, then incorrent sample selectivityregressors for the earning equations would be generated. Thus it is importantin selectivity problems to determine whether inclusion in the sample is theresult of single binary decision, or if it is the result of a single binary variablearising from more than one binary decision.

    ReferencesAmemiya, T., 1978, The estimation of a simultaneous equation generalized probit model,

    Econometrica 46, 1193-1205.Gunderson, M., 1974, Retention of trainees: A study with dichotomous dependent variables,

    Journal of Econometrics 2, 79-93.

  • 8/6/2019 Poirier 1980 Journal of Eco No Metrics

    9/9

    D.J. Poirier, Observability in bioariate probit models 217

    Heckman, J.J., 1976, Simultaneous equation models with continuous and discrete endogenousvariables and structural shifts, .in: SM. Goldfeld and R.E. Quandt, eds., Studies in nonlinearestimation (Ballinger, Cambridge, MA) 235-272.

    Heckman, J.J., 1978, Dummy endogenous variables in a srmultaneous equation system,Econometrica 46, 931-959.Heckman. J.J.. 1979, Sample bias as a specrfication error, Econometrica 47, 153- 161.Rothenberg. T.J., 1971, Identification in parametric models, Econometrica 39, 577-591.