Estimation and inference in two-stage, semi-parametric...
Transcript of Estimation and inference in two-stage, semi-parametric...
Journal of Econometrics 136 (2007) 31–64
Estimation and inference in two-stage,semi-parametric models of production processes
Leopold Simara, Paul W. Wilsonb,�
aInstitut de Statistique, Universite Catholique de Louvain, Voie du Roman Pays 20,
Louvain-la-Neuve, BelgiumbDepartment of Economics, University of Texas, Austin, TX 78712, USA
Available online 9 September 2005
Abstract
Many papers have regressed non-parametric estimates of productive efficiency on
environmental variables in two-stage procedures to account for exogenous factors that might
affect firms’ performance. None of these have described a coherent data-generating process
(DGP). Moreover, conventional approaches to inference employed in these papers are invalid
due to complicated, unknown serial correlation among the estimated efficiencies. We first
describe a sensible DGP for such models. We propose single and double bootstrap procedures;
both permit valid inference, and the double bootstrap procedure improves statistical efficiency
in the second-stage regression. We examine the statistical performance of our estimators using
Monte Carlo experiments.
r 2005 Elsevier B.V. All rights reserved.
JEL classification: C1; C44; C61
Keywords: Data envelopment analysis; DEA; Bootstrap; Technical efficiency; Nonparametric; Two-stage
estimation
ARTICLE IN PRESS
www.elsevier.com/locate/jeconom
0304-4076/$ - see front matter r 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2005.07.009
�Corresponding author. Present address: The John E. Walker, Department of Economics, 222 Sirrine
Hall, Clemson University, Clemson, SC 29634-1309, USA.
E-mail addresses: [email protected] (L. Simar), [email protected] (P.W. Wilson).
1. Introduction
Linear-programming-based measures of efficiency along the lines of Charnes et al.(1978, 1979) and Fare et al. (1985) are widely used in the analysis of efficiency ofproduction. These methods are based on definitions of technical and allocativeefficiency in production provided by Farrell (1957). Among this literature, thoseapproaches which incorporate convexity assumptions are known as data envelop-ment analysis (DEA).
DEA measures the efficiency relative to a non-parametric, maximum likelihoodestimate of an unobserved true frontier, conditional on observed data resulting froman underlying data-generating process (DGP). These methods have been widelyapplied to examine technical and allocative efficiency in a variety of industries; seeGattoufi et al. (2004) for a comprehensive bibliography. Many of these studies haveused a two-stage approach, where efficiency is estimated in the first stage, and thenthe estimated efficiencies (or, in a few cases, ratios of estimated efficiencies,Malmquist indices, etc.) are regressed on covariates (typically different from thoseused in the first stage) that are viewed as representing environmental variables. Thisapproach is advocated by Chilingerian and Sherman (2004), Ray (2004), andRuggiero (2004); published examples include Byrnes et al. (1988), Ray (1988, 1991),Nyman and Bricker (1989), Aly et al. (1990), McCarty and Yaisawarng (1993),Rhodes and Southwick (1993), Banker and Johnston (1994), Chirkos and Sears(1994), Dusanksy and Wilson (1994), Kooreman (1994), Lovell et al. (1994), Sextonet al. (1994), Chilingerian (1995), Rosko et al. (1995), Arnold et al. (1996), De Borgerand Kerstens (1996), Gonzalez and Barber (1996), Luoma et al. (1996), Carrington etal. (1997), Gillen and Lall (1997), Burgess and Wilson (1998), Kirjavainen andLoikkanen (1998), McMillan and Datta (1998), Puig-Junoy (1998), Dietsch andWeill (1999), Fried et al. (1999a), Garden and Ralston (1999),Cheng et al. (2000),Resende (2000), Worthington and Dollery (2000), Chakraborty et al. (2001),Mukherjee et al. (2001), Raczka (2001), Ralston et al. (2001), Isik and Hassan (2002),O’Donnell and van der Westhuizen (2002), Otsuki et al. (2002), Stanton (2002),Binam et al. (2003), Chu et al. (2003), Wang et al. (2003), Barros (2004), Okeahalam(2004), and Turner et al. (2004). In a slight variation on this approach, Fried et al.(1993, 1999b, 2002) regressed radial and non-radial slacks on environmentalvariables. Since their dependent variables are functions of estimated efficiencies, theproblems discussed below apply here as well. In addition, Internet search enginessuch as google.com reveal hundreds of unpublished working papers that use the two-stage approach.1
As far as we have been able to determine, none of the studies that employ this two-stage approach have described the underlying DGP. Since the DGP has not beendescribed, there is some doubt about what is being estimated in the two-stageapproaches. Among the studies that regress DEA estimates of efficiency on some
ARTICLE IN PRESS
1Internet searches on October 12, 2004 on google.com returned about 801 hits for the phrases ‘‘data
envelopment analysis’’ and ‘‘two-stage’’; about 531 hits were obtained using ‘‘data envelopment analysis’’
and ‘‘tobit’’. The vast majority of these appear to be working papers.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6432
covariates in the second stage, most have specified a censored (tobit) model for thesecond stage, but several (e.g., Aly et al., 1990; Chirkos and Sears, 1994; Dietsch andWeill, 1999; Ray, 1991; Sexton et al., 1994; Stanton, 2002) have estimated a linearmodel by ordinary least squares (OLS). Authors have argued that DEA efficiencyestimates are somehow censored since there are typically numerous estimates equalto one, but no coherent account of how the censoring arises has been offered. Othershave used OLS in the second-stage regression after transforming the DEA estimatesof efficiency using log, logistic, or log-normal transformations, and in some casesadding or subtracting an arbitrary constant to avoid division by zero or taking thelog of zero (e.g., Byrnes et al., 1988; Nyman and Bricker, 1989; Ray, 1991; Puig-Junoy, 1998). Lovell et al. (1994) and Burgess and Wilson (1998) avoided boundaryproblems in their second-stage regressions by using in their first-stage estimation aleave-one-out estimator of efficiency originally suggested by Andersen and Petersen(1993). Unfortunately, however, it is difficult to give a statistical interpretation tothis estimator, even if the second-stage regressions are ignored. Still others haveregressed ratios of efficiency estimates, Malmquist indices, or differences in efficiencyestimates in the second stage; these have avoided boundary problems, but have stillnot provided a coherent description of a DGP that would make such regressionssensible.
A more serious problem in all of the two-stage studies that we have found arisesfrom the fact that DEA efficiency estimates are serially correlated.2 Consequently,standard approaches to inference—used in all but two of the studies we have seenthat employ the two-stage approach—are invalid. The two exceptions are Xue andHarker (1999) and Hirschberg and Lloyd (2002); they recognize that DEA efficiencyestimates are serially correlated, but both papers use a naive bootstrap method basedon resampling from an empirical distribution in their attempts to correct the serialcorrelation problem. Unfortunately, the naive bootstrap used by both Xue andHarker (1999) and Hirschberg and Lloyd (2002) is inconsistent in the context of non-parametric efficiency estimation, as demonstrated by Simar and Wilson (1999a, b),and so the approaches by Xue and Harker (1999), and Hirschberg and Lloyd (2002)make little sense. Moreover, neither of these studies describe a DGP for which theirsecond-stage regressions would be appropriate, and so again it is unclear what isbeing estimated in these studies.
This paper describes a DGP that is logically consistent with regression of non-parametric, DEA efficiency estimates on some covariates in a second stage. Inaddition, we demonstrate that while conventional inference methods are inconsistentin the second-stage regression, consistent inference is both possible and feasible.
ARTICLE IN PRESS
2The correlation arises in finite samples from the fact that perturbations of observations lying on the
estimate frontier will in many, and perhaps all, cases cause changes in efficiencies estimated for other
observations. A similar problem arises in OLS regression, where estimated residuals are serially correlated
in finite samples even when the underlying true residuals are not (see Maddala (1988), for discussion).
However, in the regression case, the correlation disappears more quickly than in the DEA context, where
convergence rates are much slower in higher dimensions.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 33
2. A statistical model
Let x 2 Rpþ denote a ð1� pÞ vector of inputs, y 2 R
qþ denote a ð1� qÞ vector of
outputs, and z 2 Rr denote a ð1� rÞ vector of environmental variables. Elements of zmay be either continuous or discrete; we will elaborate further on this later in thepaper. How one might decide whether a particular variable is an environmentalvariable rather than an input or output of the production process is not entirelyclear; our aim is not to provide answers to such questions, but rather to rationalizethe two-stage analysis that is performed once this decision has been made.3
Studies that have used the two-stage approach have assumed, either explicitly orimplicitly, that firms face certain environmental variables z, and that these constraintheir choices of inputs x and outputs y. In the real world, the analyst is confrontedwith a set of observations Sn ¼ fðxi; yi; ziÞg
ni¼1.
Assumption A1. The sample observations ðxi; yi; ziÞ in Sn are realizations of
identically, independently distributed random variables with probability density
function f ðx; y; zÞ which has support over P� Rr, where P � Rpþqþ is a production
set defined by
P ¼ fðx; yÞjx can produce yg. (1)
In any interesting case, z is not independent with respect to ðx; yÞ, i.e.,f ðx; yjzÞaf ðx; yÞ; independence between z and ðx; yÞ can be tested using the methodssurveyed by Wilson (2003). Otherwise, there would be no motivation for the second-stage regression. Assumption A1 means that the constraints on firms’ choices ofinputs x and outputs y due to the environmental variables that firms face operatethrough the dependence of ðx; yÞ on z in f ðx; y; zÞ.4
The boundary of P is sometimes referred to as the technology or the production
frontier, and is given by the intersection of P and the closure of its compliment.Firms which are technically inefficient operate at points in the interior of P, whilethose that are technically efficient operate somewhere along the technology definedby the boundary of P.
ARTICLE IN PRESS
3Note that one can test whether a particular variable is an input, or an output, using the methods
described in Simar and Wilson (2001a).4Coelli et al. (1998, pp. 166–171) discuss several alternative formulations where the production set is
made to depend on the environmental variables z in various ways, in contrast to the formulation in (1) and
(2). In each of these alternative formulations, the definition of the production set would involve
conditioning on z, and an additional set of constraints involving z would be added to the linear program
(10) below that is used to estimate d0 in (2). While these approaches might be sensible for some situations,
they were not used in the studies cited earlier in Section 1; moreover, these alternative approaches leave no
role for a second-stage regression, and hence are not the focus of this paper. In our formulation, the
environmental variables z influence the mean and variance of the inefficiency process, but not the
boundary of its support; this is consistent with the formulations in the studies we have cited in Section 1,
where in each case the environmental variables appear only in the second-stage regressions. This is also
similar to the idea behind parametric, stochastic frontier models where the mean of the inefficiency term is
parameterized in terms of some covariates, as we discuss later in Section 5.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6434
Various measures of technical efficiency are possible. Define a measure d for somepoint ðx0; y0Þ 2 R
pþqþ such that
d0 ¼ dðx0; y0jPÞ � supfdjðx0; dy0Þ 2 P; d40g. (2)
This is simply the Farrell (1957) measure of output technical efficiency, which is thereciprocal of the Shephard (1970) output distance function. For ðx0; y0Þ 2 P,dðx0; y0jPÞX1. Note that d provides a measure of Euclidean distance from the pointðx0; y0Þ 2 R
pþqþ to the boundary of P in a direction parallel to the output axes and
orthogonal to the input axes. In the next assumption, and frequently in thediscussion that follows, we will replace the subscripts ‘‘0’’ in (2) with subscript i tosignify that the distance measure is evaluated for a particular, specific observationin Sn.
For reasons that will become clear later, it is convenient to represent y in terms ofits polar coordinates, while expressing the modulus in terms of distance from theboundary ofP. Again for an arbitrary point ðx0; y0Þ 2 R
pþqþ , and y0 ¼ ½y01 . . . y0q�, we
can write the angles as
Z0j ¼arctanðy0;jþ1=y01Þ for y0140;
p=2 if y01 ¼ 0;
((3)
for j ¼ 1; . . . ; q� 1. The corresponding modulus is given by oðy0Þ ¼ffiffiffiffiffiffiffiffiffiy00y0
p, which is
related to the Farrell efficiency measure by
dðx0; y0jPÞ ¼oðdðx0; y0jPÞ y0Þ
oðy0Þ. (4)
Thus, since P is fixed, we can characterize y0 by ðg0; d0Þ, where g0 ¼ ½Z01 . . . Z0;q�1�and d0 ¼ dðx0; y0jPÞ.
The joint density f ðx; y; zÞ can now be described by a series of conditional densitiesand in terms of cylindrical coordinates:
f ðxi; gi; di; ziÞ ¼ f ðxi; gijdi; ziÞf ðdijziÞf ðziÞ. (5)
The order of the conditioning on the right-hand side of (5) reflects the sequentialnature of the DGP. Firm i is faced with environmental variables zi drawn from f ðzÞ.Given this zi, an efficiency level di is drawn from f ðdijziÞ, and then xi and gi aredrawn from f ðx; gjd; zÞ, resulting in a realization ðxi; yi; ziÞ from the joint densityf ðx; y; zÞ after transforming the polar coordinates ðgi; diÞ to Cartesian coordinates yi.
Assumption A2. The conditioning in f ðdijziÞ in (5) operates through the following
mechanism:
di ¼ cðzi; bÞ þ eiX1, (6)
where c is a smooth, continuous, function, b is a vector of (possibly infinitely many)parameters, and ei is a continuous iid random variable, independent of zi.
Note that Assumptions A1 and A2 amount to a separability condition; theproduction setP is assumed to be a subset of the entire sample space, while the effectof the covariates z operates through the dependence between y and z induced by (6).
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 35
These assumptions provide a rationale for second-stage regressions. In an appliedsetting, one might reasonably wonder whether the implied separability condition issupported by the data; if it is not, one might prefer to use one of the alternativeformulations discussed by Coelli et al. (1998) (see footnote 4, where the covariates zare included in the first-stage estimation of efficiency. In the alternative formula-tions, the covariates z take the role of inputs or outputs either discretionary or non-discretionary). Fortunately, the testing methods described by Simar and Wilson(2001a) can be used to test the separability assumption here.
Assumption A3. ei in (6) is distributed Nð0;s2e Þ with left-truncation at 1� cðzi;bÞ for
each i.
Assumption A3 could be changed to impose some other distribution on the ei, butnormality seems a natural choice. Alternatively, one could leave the distribution of eunspecified and employ the semi-parametric method for truncated regressionproposed by Honore and Powell (1994); one could in addition leave the functioncð�Þ unspecified and employ the non-parametric method for truncated regressionproposed by Lewbel and Linton (2002). Here, however, we impose a distributionalform for e and later will assume a form for cð�Þ, since that is what has been done inevery two-stage applied study that we are aware of.
The production set P is sometimes described in terms of its sections
YðxÞ � fyjðx; yÞ 2 Pg (7)
and
XðyÞ � fxjðx; yÞ 2 Pg, (8)
which form the output feasibility and input requirement sets, respectively. Knowl-edge of either YðxÞ for all x or XðyÞ for all y is equivalent to knowledge of P; Pimplies (and is implied by) both YðxÞ and XðyÞ. Thus, both YðxÞ andXðyÞ inherit theproperties of P.
Various assumptions regarding P are possible; we adopt those of Shephard (1970)and Fare (1988):
Assumption A4. P is closed and convex; YðxÞ is closed, convex, and bounded for all
x 2 Rpþ; and XðyÞ is closed and convex for all y 2 R
qþ.
Assumption A5. ðx; yÞeP if x ¼ 0, yX0; ya0, i.e., all production requires use of some
inputs.
Assumption A6. For exXx; eypy, if ðx; yÞ 2 P then ðex; yÞ 2 P and ðx;eyÞ 2 P, i.e., both
inputs and outputs are strongly disposable.
Here and throughout, inequalities involving vectors are defined on an element-by-element basis; e.g., for ex; x 2 R
pþ, exXx means that some number ‘ 2 f0; 1; . . . ; pg of
the corresponding elements of ex and x ere equal, while ðp� ‘Þ of the elements of exare greater than the corresponding elements of x. Assumption A5 merely says thatthere are no free lunches. Assumption A6 is sometimes called free disposability and isequivalent to an assumption of monotonicity of the technology.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6436
In order for our estimators of P and dðx0; y0jPÞ to be consistent, some additionalassumptions are needed. In particular, the probability of observing firms in aneighborhood of the boundary of P must approach unity as the sample size increases:
Assumption A7. For all ðx; yÞ 2 P such that ðy�1x; yePÞ and ðx; yyÞePÞ for y41,f ðx; yjzÞ is strictly positive, and f ðx; yjzÞ is continuous in any direction toward the
interior of P for all z.
Also, an assumption about the smoothness of the frontier is needed:
Assumption A8. For all ðx; yÞ in the interior of P, dðx; yjPÞ is differentiable in both its
arguments.
Our characterization of the smoothness condition here is stronger than required;Kneip et al. (1998) require only Lipschitz continuity for the distance functions, whichis implied by the simpler, but stronger requirement presented here.
Assumptions A4–A6 are standard in microeconomic theory of the firm;assumptions A7 and A8 are based on those of Kneip et al. (1998), with someextensions to accommodate the environmental variables in z. These assumptions aresufficient to ensure statistical consistency of the DEA estimators in the first-stageproblem that appears below. Assumptions A1–A3 have been introduced specificallyto accommodate the environmental variables. Together, Assumptions A1–A8 definea semi-parametric DGP F which yields the data in Sn. The problem is to estimatefdig
ni¼1 and b, and then to make inferences about these unknown quantities. For the
case of the di, we have provided two approaches to inference, in Simar and Wilson(1998, 2000a). Our focus here is on estimation and inference about b, which describesthe marginal effects of z on inefficiency.
3. Typical two-stage approaches
The convex hull of the free disposal hull of the observed pairs ðxi; yiÞ contained inSn has frequently been used to estimate the production set P. This estimator isdescribed bybP ¼ fðx; yÞ j ypYq; xXXq; i0q ¼ 1; q 2 Rn
þg, (9)
where Y ¼ ½y1 . . . yn�, X ¼ ½x1 . . . xn�, i denotes an ðn� 1Þ vector of ones, and q is anðn� 1Þ vector of intensity variables. Korostelev et al. (1995) proved that bP is aconsistent estimator of P under conditions met by assumptions A1–A8 above.Estimators of the Farrell efficiency measure can be constructed by replacingP on theright-hand side of (4) with bP.
The estimator of d0 ¼ dðx0; y0jPÞ defined in (2) at a particular point ðx0; y0Þ 2 Rpþqþ
can be written in terms of the linear programbd0 ¼ dðx0; y0jbPÞ
¼ maxfy40 j yy0pYq; x0XXq; i0q ¼ 1; q 2 Rnþg, ð10Þ
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 37
where the maximization provides a solution for y as well as q. This is merely theempirical analog of the measure defined in (2).
It is straightforward to prove that bd0 is a consistent estimator of d0 underassumptions A1–A8 by altering the notation in Kneip et al. (1998). In particular,bd0 ¼ d0 þ Opðn
�2=ðpþqþ1ÞÞ. (11)
The rate of convergence is low, as is typical in non-parametric estimation, and therate slows as pþ q is increased—this is the well-known curse of dimensionality.Moreover, by construction, bd0 is biased downward.
Few results exist on the sampling distributions of non-parametric efficiencyestimators such as the one in (10). Gijbels et al. (1999) derived the asymptoticdistribution of the Shephard (1970) output distance function in the special case ofone input and one output ðp ¼ q ¼ 1Þ, along with an analytic expression for its largesample bias and variance, and it is similarly straightforward to extend these results tothe input-oriented case by appropriate changes in notation. Unfortunately, in themore general multivariate setting where pþ q42, the radial nature of the distancefunctions and the complexity of the estimated frontier complicate the derivations. Sofar, the bootstrap appears to offer the only way to approximate asymptoticdistribution of the distance function estimators in multivariate settings. For thesecond-stage regression problem considered here, the bootstrap also appears to beuseful.
In principle, one could assume b in (2) has finite dimensions, fully specify thedensity f ðx; y; zÞ, and then estimate b by maximum likelihood. More often, however,researchers have employed some variant of the approach outlined below. The two-stage studies that have appeared in the literature typically specify cðzi;bÞ ¼ zib sothat (6) can be written as
di ¼ zibþ eiX1, (12)
where di ¼ dðxi; yijbPÞ in the notation of (10). These studies then (i) use the observed
pairs ðxi; yiÞ in Sn to estimate di for all i ¼ 1; . . . ; n, yielding a set of estimates fbdigni¼1;
(ii) replace the unobserved di on the left-hand side of (12) with estimates bdi obtainedfrom step (i); and then (iii) estimatebdi ¼ zibþ xiX1 (13)
using censored (tobit) regression or, in a few cases, OLS.5
Regardless of the form chosen for c, the two-stage approach outlined abovepresents problems for inference. First, the dependent variable in (6) and (12) isunobserved, and must be replaced by an estimate in the actual regression that is
ARTICLE IN PRESS
5Some (e.g., Puig-Junoy et al., 1998) have transformed the dependent variable, while introducing an
arbitrary constant to avoid taking the log of zero. Those who have used a tobit specification in the second
stage have justified their approach by the fact that typically several, perhaps many, efficiency estimates
equal unity in a given application. As far as we are aware, all who have specified a censored regression
model in the second stage have constrained the processes that determine the probability of censoring and
that govern the uncensored observations to be the same. See the Appendix for a discussion of censored and
truncated regression models, their differences, and a discussion of the last point.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6438
estimated. The bdi’s that are used in the second-step estimation of (13) are seriallycorrelated, and in a complicated, unknown way. To understand this, note that di ¼bdðxi; yij
bPÞ depends on all the observations ðxi; yiÞ in Sn through bP, and conse-quently so must the error term xi in (13). Moreover, while the observations in Sn
are assumed independently drawn in assumption A1, xi and yi are correlatedwith zi due to assumption A2—otherwise, there would be no motivation for thesecond-stage regression. This in turn means the error term xi in (13) is correlatedwith zi.
Both the correlation among the xi’s as well as the correlation between xi and zi
disappear asymptotically, but only at the same slow rate given in (11) with which bdi
converges. This means that maximum likelihood estimates of b in the second-stageregression will be consistent, but will not have the usual, parametric convergence rateof n�1=2. More troubling, however, for pþ q43, the correlation among the xi’s doesnot disappear quickly enough for standard approaches to inference (based on theinverse of the negative Hessian of the log-likelihood) to be valid.
Further consideration reveals an additional problem. Note that we can alwayswrite bdi ¼ EðbdiÞ þ ui, (14)
where EðuiÞ ¼ 0. In addition, the bias of the estimator bdi is defined by
BIASðbdiÞ � EðbdiÞ � di. (15)
Substituting for EðbdiÞ from (14) in (15) and re-arranging terms yields
di ¼bdi � BIASðbdiÞ � ui. (16)
Substituting for di in (12) givesbdi � BIASðbdiÞ � ui ¼ zibþ eiX1. (17)
Since bdi is a consistent estimator, the ui become negligible asymptotically, as doesBIASðbdiÞ. These facts provide justification for writing (13), the equation that istypically estimated in two-stage applications.
Although the ui in (14) and (16), (17) have zero mean, the term BIASðbdiÞ does not.Rather, the bias of bdi is always strictly negative in finite samples. The ui are unknownand cannot be estimated, but the bias term can be estimated by bootstrap methods;see Efron and Tibshirani (1993) for discussion, and Simar and Wilson (2000a) for anexample similar to the present context. The bootstrap bias estimate equals the truebias plus a residual:dBIASðbdiÞ ¼ BIASðbdiÞ þ vi. (18)
The variance of the residual vi diminishes as n!1, and hence vi is typically ofsmaller magnitude than BIASðbdiÞ for reasonable sample sizes n. The bootstrapestimator of bias can in turn be used to construct a bias-corrected estimator of d:bbdi ¼
bdi �dBIASðbdiÞ. (19)
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 39
Substituting for dBIASðbdiÞ in (19) from (18), re-arranging terms, and then substitutingfor BIASðbdiÞ in (17) yields
bbdi þ vi � ui ¼ zibþ eiX1. (20)
As noted, both the terms vi and ui become negligible asymptotically; hence maximumlikelihood estimation on
bbdi � zibþ eiX1 (21)
will yield consistent estimates.Comparing (13) and (17), it is clear that estimation of (13) ignores both BIASðbdiÞ
and ui. Estimation of (21) ignores vi and ui. This aspect of the estimation problemresembles the story of measurement error in the dependent variable that is told inevery undergraduate econometrics textbook. If vi is indeed of smaller magnitudethan BIASðbdiÞ, we would expect estimates of b from (21) to be more statisticallyefficient than those from (13). We examine this question in our Monte Carloexperiments that follow.
In every two-stage application that we have found, the bias term in (17) has beenignored. In the special case where p ¼ q ¼ 1, Gijbels et al. (1999) demonstratethat BIASðbdiÞ is affected by the curvature of the boundary of P. Although resultsdo not exist for more general cases, one can speculate that a similar phenomenonexists in higher-dimensional spaces. In addition, it seems clear that BIASðbdiÞ willbe larger in regions of P where data are sparse relative to other regions of P wherethe data are more dense. In regions where the data are sparse, there is lessinformation about where the boundary of P lies, and hence the bias in estimatedefficiency is likely to be larger.6 All of this suggests that BIASðbdiÞ, which isincorporated in the error term of (13) when it is ignored, is correlated with xi and yi,and hence with zi. While this correlation, as well as the bias itself, disappearsasymptotically, it is reasonable to suppose that including an estimate of bias inthe second-stage regression might improve the efficiency of estimation of b infinite samples, and perhaps also improve the coverage of estimated confidenceintervals for b.
As a final remark, we note that almost all researchers have estimated (13) byassuming a censored normal (tobit) specification for xi. The tobit specification issometimes motivated by the observation that several values in fbdig
ni¼1 are equal to
unity, suggesting a probability mass at 1. However, it is important to recall that theunderlying true model in (12) (or (6)) does not have this property. The process thatdetermines whether bdi ¼ 1 is primarily an artifact of finite samples, and has nothingto do with the process described by (6) in Assumption A2.
ARTICLE IN PRESS
6It is common for data to be rather unevenly distributed over P in applications; examples include data
for banks (Wheelock and Wilson, 2000, 2001) and hospitals (Wilson and Carey, 2004).
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6440
4. Toward better estimation and inference
The problems associated with estimating (13) and making inference about b arisefrom the serial correlation and bias of bdi, and the correlation between xi and zi. Thestructures of these phenomena are unknown, and difficult to guess. In addition, (13)differs from the true model in (12), where the dependent variable is unobserved. Wepropose two bootstrap procedures to overcome these difficulties; we describe theprocedure in this section, and present Monte Carlo evidence in the next section.
In order to implement a bootstrap procedure, we must draw iid bootstrap samples(i.e., pseudo-data) ðx�i ; y
�i ; z�i Þ from a density bf ðx; y; zÞ. By now it is well-known that
the ordinary, naive bootstrap based on resampling from the empirical distribution ofthe data is inconsistent in the present context due to the bounded nature of the DGP;see Simar and Wilson (1999a, b, 2000b) and Kneip et al. (2003) for discussion. Kneipet al. (2003) derive the asymptotic distribution of the DEA efficiency estimator, andprove the consistency of two different bootstrap procedures for making inferencesabout efficiencies of individual firms; one of these procedures relies on sub-sampling,while the other requires smoothing both the density of inputs and outputs as well asthe DEA estimate of the frontier. Simar and Wilson (1998, 2000a) describedsmoothed bootstrap procedures that approximate the second approach of Kneipet al. (2003), but neither Simar and Wilson (1998, 2000a) nor Kneip et al. (2003)considered environmental variables.
Note that Assumption A2 provides an extra piece of information that was notavailable in the models considered by Simar and Wilson (1998, 2000a) and Kneipet al. (2003). The bootstrap procedures proposed by Simar and Wilson (2000a) andKneip et al. (2003) allow for heterogeneity in the distribution of d, but incorporateno assumptions on the form of the heterogeneity; here, however, the form is madeexplicit by Assumption A2. Fortunately, the information provided by AssumptionA2 allows considerable simplification in the bootstrap procedures we propose below.
We propose two bootstrap procedures for the two-stage efficiency estimationproblem. The first procedure is designed to improve on inference, but without takingaccount of the bias term in (17):
Algorithm #1.
[1] Using the original data in Sn, compute bdi ¼bdðxi; yij
bPÞ8i ¼ 1; . . . ; n using (10).[2] Use the method of maximum likelihood to obtain an estimate bb of b as well as an
estimate bse of se in the truncated regression of bdi on zi in (13) using the mon
observations where bdi41.[3] Loop over the next three steps ([3.1]–[3.3]) L times to obtain a set of bootstrap
estimates A ¼ fðbb�;bs�e ÞbgLb¼1:[3.1] For each i ¼ 1; . . . ;m, draw ei from the Nð0;bs2e Þdistribution with left-
truncation at ð1� zibbÞ.7
ARTICLE IN PRESS
7See the Appendix for details on how to draw from a left-truncated normal distribution.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 41
[3.2] Again for each i ¼ 1; . . . ;m, compute d�i ¼ zibbþ ei.
[3.3] Use the maximum likelihood method to estimate the truncated regression of
d�i on zi, yielding estimates ðbb�;bs�e Þ.[4] Use the bootstrap values in A and the original estimates bb;bse to construct
estimated confidence intervals for each element of b and for se as describedbelow.
As discussed earlier, results from Kneip et al. (1998) establish consistency of theestimation in step [1] under our assumptions. Given the discussion in Section 3 andthe fact that bdi is a consistent estimator of di, it follows that maximum likelihoodestimation in step [2] will yield consistent estimates of b, though without thecustomary
ffiffiffinp
-convergence rate. Step [3] is simply a parametric bootstrap of a(nonlinear) regression model; properties of the bootstrap in the context of regressionmodels have been examined by Bickel and Freedman (1981), Wu (1986), and others.
Alternatively,bbdi can be regressed on zi in (21), and the following bootstrap
procedure can be used to provide inference about b.
Algorithm #2.
[1] Using the original data in Sn, compute bdi ¼bdðxi; yij
bPÞ 8 i ¼ 1; . . . ; n using (10).[2] Use the method of maximum likelihood to obtain an estimate bb of b as well as
an estimate bse of se in the truncated regression of bdi on zi in (13) using the mon
observations when bdi41.[3] Loop over the next four steps ([3.1]–[3.4]) L1 times to obtain n sets of bootstrap
estimates Bi ¼ fbd�ibgL1
b¼1:
[3.1] For each i ¼ 1; . . . ; n, draw ei from the Nð0;bs2e Þdistribution with left-
truncation at ð1� zibbÞ.
[3.2] Again for each i ¼ 1; . . . ; n, compute d�i ¼ zibbþ ei.
[3.3] Set x�i ¼ xi; y�i ¼ yibdi=d
�i for all i ¼ 1; . . . ; n.
[3.4] Compute bd�i ¼ dðxi; yijbP�Þ8i ¼ 1; . . . ; n, where bP� is obtained by replacing
Y ; X in (9) with Y� ¼ ½y�1 . . . y�n�, X� ¼ ½x�1 . . . x
�n�.
[4] For each i ¼ 1; . . . ; n, compute the bias-corrected estimatorbbdi defined by (19)
using the bootstrap estimates in Bi obtained in step [3.4] and the original
estimatebdi.[5] Use the method of maximum likelihood to estimate the truncated regression ofbbdi on zi, yielding estimates ð
bbb;bbsÞ.[6] Loop over the next three steps ([6.1]–[6.3]) L2 times to obtain a set of bootstrap
estimates C ¼ fðbb�;bs�e ÞbgL2
b¼1:
[6.1] For each i ¼ 1; . . . ; n, draw ei from the Nð0;bbsÞ distribution with left-
truncation at ð1� zibbbÞ.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6442
[6.2] Again for each i ¼ 1; . . . ; n, computed��i ¼ zibbbþ ei.
[6.3] Use the maximum likelihood method to estimate the truncated regression
of d��i on zi, yielding estimates ðbbb�;bbs�Þ.
[7] Use the bootstrap values in C and the original estimatesbbb; bbs to construct
estimated confidence intervals for each element of b and for se as describedbelow.
Note that steps [1] and [2] in Algorithm #2 are the same as in Algorithm #1. Steps[3] and [4] in Algorithm #2 employ a parametric bootstrap in the first-stage problem
in order to produce bias-corrected estimatesbbdi. The parametric structure provided by
Assumption A2, when we assume in addition that cðzi;bÞ ¼ zib, greatly simplifiesthe smoothing that was employed in Simar and Wilson (2000a) and Kneip et al.
(2003); otherwise, the bootstrap used to obtainbbdi is similar to the one described in
Simar and Wilson (2000a), and approximates the double-smooth procedure used inKneip et al. (2003). Steps [5] and [6] are essentially the same as in Algorithm #1,
except that the bias-corrected estimatesbbdi replace di in (12) instead of bdi as in
Algorithm #1.In either case, once the set of bootstrap values A or C has been obtained either in
step 3 of Algorithm #1 or step 6 of Algorithm #2, percentile bootstrap confidenceintervals can be constructed. To illustrate, suppose that interest lies in bj, the jth
element of b, which has been estimated bybbbj, the jth element of
bbb. If the distributionof ðbbbj � bjÞ were known, it would be trivial to find values aa; ba such that
Pr½�bapðbbbj � bjÞp� aa� ¼ 1� a (22)
for some small value of a, 0oao1, say a ¼ :05. Since the distribution of ðbbbj � bjÞ is
unknown, we can use the jth element of each bootstrap valuebbb� to find values a�a; b�a
such that
Pr½�bapðbbb�j � bbbjÞp� a�a� � 1� a, (23)
with improving approximation as L2!1. Substituting a�a; b�a for aa; ba in (22)
leads to an estimated confidence interval ½bbbj þ a�a;
bbbj þ b�a�.
Practical implementation of either Algorithm #1 or Algorithm #2 is possible, andeasy, with existing software. There are now a number of packages available that canbe used to compute the DEA efficiency estimator, and packages such as LIMDEP,STATA, and others can be used to estimate truncated regression models.Interpreters such as MATLAB, R, S-Plus, etc. can be used to organize the resultsfrom one package so that they may be fed to another package.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 43
The only remaining issue concerns the choice of the number of replications, L, inAlgorithm #1, or L1 and L2 in Algorithm #2. The choice of L1 in Algorithm #2determines the number of bootstrap replications used to compute the bias-corrected
estimatesbbdi. We and others have found that 100 replications are typically sufficient
for this purpose, since constructingbbdi requires only computation of a mean and then
a difference. The choices of L and L2, however, determine the number of bootstrapreplications used to construct estimates of confidence intervals in the two algorithms.Confidence-interval estimation is tantamount to estimating the tails of distributions,which necessarily requires more information. Hall (1986) suggests 1000 replicationsfor estimating confidence intervals. We use 2000 replications (the values of L and L2)in our simulations and empirical examples that follow. More accurate estimates canbe achieved with larger numbers of replications, and, in the case of confidence-interval estimation, diminishing returns arise slowly. One must balance this concern,however, with the waiting time incurred when the number of replications is increased.
In Simar and Wilson (2001b, 2004), the bootstrap principle was iterated to assessthe accuracy of bootstrap confidence interval estimates, along the lines described byHall (1992). Although a similar procedure could be employed to assess the accuracyof the confidence intervals estimated with Algorithms #1 and #2, the doublebootstrap in Algorithm #2 uses sequential, rather than iterated, bootstraps.Moreover, the first loop in step [3] of Algorithm #2 is used only to construct bias-corrected distance function estimates, and so it is reasonable to use fewer replicationsthan when estimating confidence intervals in the second loop. Consequently, thecomputational burden incurred by Algorithm #2 is far less than what would typicallybe incurred with an iterated bootstrap as in Simar and Wilson (2001b, 2004).
5. Link with fully parametric models
Before turning to Monte Carlo evidence on the performance of the proceduresproposed in this section, note that some studies (e.g., Pitt and Lee, 1981; Kaligajanand Shand, 1985) have estimated fully parametric models with compositeerrors along the lines of Aigner et al. (1977) and Meeusen and van den Broeck(1977), with a second-stage regression of estimated inefficiency on some environ-mental variables. Kumbhakar et al. (1991), Reifshneider and Stevenson (1991),Huang and Liu (1994), Battese and Coelli (1995), and others have instead estimated(in a single stage) fully parametric models where the environmental variables areused to parameterize the mean of the one-sided inefficiency component of thecomposite error process.8
ARTICLE IN PRESS
8Regressing efficiency estimates obtained from maximum likelihood estimation of a parametric model
along the lines of Aigner et al. (1977) and Meeusen and van den Broeck (1977) is almost certain to result in
problems for statistical consistency. The covariates in the second-stage regression are correlated with the
one-sided error terms from the first stage in any interesting case; otherwise, there would be no need for the
second-stage regression. In any application, the covariates in the second stage are likely to be correlated,
perhaps highly so, with the covariates in the first stage, and hence the errors in the first stage cannot be
independent of the covariates in the first stage. Consequently, the likelihood that is maximized is not the
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6444
With a small modification of the assumptions in Section 2, these fully parametricmodels can be seen to be related to the semi-parametric approach in this paper. First,replace assumptions A2 and A3 with the following:
Assumption A2a. The conditioning in f ðdijziÞ in (5) operates through the following
mechanism:
di ¼ expðzibþ eiÞ, (24)
where b is a vector of (finitely many) parameters, and ei is a continuous iid random
variable, independent of zi.
Assumption A3a. ei in (24) is distributed Nð0;s2e Þ with left truncation at �zib for each i.
Then, for the case of one output (q ¼ 1), we can write
log yi ¼ log gðxijaÞ � xi, (25)
with xi distributed Nðzib;s2e Þ with left truncation at 0, and gð�j�Þ is a parametricfunction known up to a finite-length parameter vector a. This is a special case of
log yi ¼ log gðxijaÞ þ vi � xi, (26)
with viNð0;s2vÞ, which is the model estimated by Kumbhakar et al. (1991),Reifshneider and Stevenson (1991), Huang and Liu (1994), Battese and Coelli (1995),and others. In (25), s2v ¼ 0. The model in (26) can be extended to accommodatemultiple outputs; see Adams et al. (1999), Coelli (2000), Coelli and Perelman (2001),Atkinson and Primont (2002), and Sickles et al. (2002) for examples and discussion.
Note that the parametric structure afforded by gð�j�Þ is necessary for estimation toproceed in a single stage. Typically, gð�j�Þ is assumed to be a translog function, butthis is likely a mis-specification in many cases, particularly when firms are of widelyvarying size (see Wheelock and Wilson (2000), and Wilson and Carey (2004) fordiscussion and empirical examples, and Guilkey et al. (1983) and Chalfant andGallant (1985) for Monte Carlo evidence).
6. Monte Carlo experiments
To examine the performance of the various approaches to inference in the second-stage regression, we conducted several Monte Carlo experiments. In each case, wegenerated data from a known process and applied our bootstrap algorithms on eachof M Monte Carlo trials.
Data for the ith observation in each Monte Carlo trial were generated settingzi1 ¼ 1 and drawing zijNðmz;s
2zÞ for j ¼ 2; . . . ; r. Then ei is drawn from a Nð0;s2e Þ
left-truncated 1� zib. We then set di ¼ zibþ ei. Next, for each j ¼ 1; . . . ; p, we draw
ARTICLE IN PRESS
(footnote continued)
correct one, unless one takes account of the correlation structure. In our semi-parametric model given by
Assumptions A1–A8, this problem is avoided since the first-stage estimation does not require
independence between the inefficiencies and the inputs and outputs.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 45
xijuniformð6; 16Þ. If q ¼ 1, we then set yi ¼ d�1i
Ppj¼1 x
3=4ij . Otherwise, we set
z ¼ d�1i
Ppj¼1x
3=4ij , and draw a1uniformð0; 1Þ. If q42, then for each ‘ ¼ 2; . . . ; q� 1
we also draw a‘uniformð0; 1�Pq�2
‘¼1a‘Þ. Finally, for j ¼ 1; . . . ; q� 1, we set
yij ¼ d�1i ajz, and then set yiq ¼ d�1i ð1�Pq�1
‘¼1a‘Þz. For the case q41, aggregate
output z is computedand then disaggregated among the q individual outputs.Although not required by Assumptions A1–A8, this, this,this, this results inindependence between the mix of outputs, characterized by the angles g, and ðx; d; zÞ.
In our first set of experiments, we set r ¼ 2, mz ¼ 2, sz ¼ 2, se ¼ 1. In addition,values of each element of b must be set. In order to simplify scaling in estimation ofthe second-stage truncated regression, we set b1 ¼ b2 ¼ 0:5.
In each experiment, we ran 1000 Monte Carlo trials. For Algorithm #1, we useL ¼ 2000 bootstrap replications. For Algorithm #2, we use L1 ¼ 100 replications forthe first loop used to compute the bias-corrected efficiency estimates, and L2 ¼ 2000replications for the second loop where the truncated regression model is boot-strapped. Also in each experiment, we compute the proportion among the 1000Monte Carlo experiments where the estimated confidence interval covers the truevalue of b1, b2, and se at nominal significance levels of 0.80, 0.90, 0.95, and 0.99.
Table 1 reports results for three sets of cases: truncated regression of bdi on zi, withinference by Algorithm #1 or by conventional methods (i.e., relying on asymptoticnormality), and censored (tobit) regression of bdi on zi with conventional inference. Inthe case of inference based on Algorithm #1 (columns 4–7 in Table 1), for p ¼ q ¼ 1and n ¼ 100, coverages for the slope parameter (b2) are too small, but not by a largeamount. Coverages improve when the sample size is increased to 400, as expected.Table 1 also reveals that as p and q are increased, the coverages obtained withAlgorithm #1 become slightly worse for a given sample size, but not too much so.Some worsening is to be expected due to the curse of dimensionality in the first-stageestimation; as ðpþ qÞ increases, the dependent variable in the second-stageregressions becomes noisier, making it more difficult to obtain precise informationabout the parameters of the regression.
Results for conventional inference are shown in columns 8–11 of Table 1.Coverages are roughly similar to those obtained with Algorithm #1. The last fourcolumns of Table 1 give results for tobit regression with conventional inference. Thisapproach involves a specification error in the sense that the model estimated is mis-specified. Not surprisingly, the results in Table 1 indicate that this approach resultsin catastrophe.
Additional insight is gained by examining the distributions of the estimators overthe 1000 Monte Carlo trials in our experiments. Fig. 1 shows kernel density estimatesfor these distributions where n ¼ 400.9 Fig. 1 contains six plots of density estimates;the rows (from top to bottom) correspond to bb1, bb2, and bse; the first columncorresponds to the tobit estimates described in Table 1, while the second columncorresponds to the truncated regression estimates obtained with Algorithm #1 and
ARTICLE IN PRESS
9The kernel density estimates were obtained using an Epanechnikov kernel and with bandwidths chosen
by the two-stage plug-in procedure proposed by Sheather and Jones (1991).
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6446
ARTICLE IN PRESS
Table
1
Estim
atedcoverages
ofconfidence
intervalsfrom
regressionofb d ion
z i
Trunc.
regression
Trunc.
regression
Tobitregression
p¼
qn
Param.Algorithm
#1—
nominalsignificance
Conventionalinference—
nominalsignificance
Conventionalinference—
nominalsignificance
0.80
0.90
0.95
0.99
0.80
0.90
0.95
0.99
0.80
0.90
0.95
0.99
1100b 1
0.811
0.868
0.891
0.925
0.808
0.895
0.925
0.965
1.000
1.000
1.000
1.000
b 20.743
0.819
0.858
0.915
0.752
0.841
0.882
0.938
0.030
0.043
0.059
0.097
s e0.747
0.809
0.848
0.905
0.736
0.814
0.867
0.933
0.000
0.000
0.000
0.000
1400b 1
0.811
0.912
0.948
0.976
0.805
0.911
0.955
0.988
1.000
1.000
1.000
1.000
b 20.784
0.878
0.923
0.967
0.782
0.884
0.929
0.974
0.000
0.000
0.000
0.000
s e0.776
0.875
0.909
0.956
0.774
0.874
0.916
0.966
0.000
0.000
0.000
0.000
2100b 1
0.886
0.928
0.948
0.970
0.875
0.953
0.973
0.990
1.000
1.000
1.000
1.000
b 20.661
0.727
0.779
0.846
0.676
0.763
0.815
0.893
0.043
0.060
0.087
0.147
s e0.664
0.742
0.789
0.850
0.661
0.754
0.811
0.893
0.000
0.000
0.000
0.000
2400b 1
0.772
0.931
0.979
0.992
0.746
0.885
0.958
0.996
1.000
1.000
1.000
1.000
b 20.733
0.822
0.884
0.939
0.732
0.835
0.897
0.960
0.001
0.001
0.001
0.001
s e0.735
0.812
0.873
0.926
0.722
0.805
0.883
0.936
0.000
0.000
0.000
0.000
3100b 1
0.900
0.943
0.958
0.970
0.916
0.960
0.978
0.991
1.000
1.000
1.000
1.000
b 20.624
0.699
0.748
0.807
0.662
0.745
0.793
0.868
0.078
0.112
0.147
0.219
s e0.655
0.716
0.758
0.816
0.663
0.741
0.795
0.863
0.000
0.000
0.000
0.000
3400b 1
0.627
0.893
0.990
0.999
0.563
0.755
0.896
0.997
1.000
1.000
1.000
1.000
b 20.649
0.761
0.821
0.904
0.647
0.777
0.841
0.936
0.002
0.005
0.005
0.008
s e0.712
0.792
0.846
0.919
0.698
0.791
0.857
0.936
0.000
0.000
0.000
0.000
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 47
also described in Table 1. In each of the six panels in Fig. 1, the dotted curve showsthe density estimate corresponding to p ¼ q ¼ 1, while the solid curve shows thedensity estimate corresponding to p ¼ q ¼ 2. True values of the parameters areindicated by the vertical dashed lines. Fig. 1 makes clear why the coverages of
ARTICLE IN PRESS
0.5 0.0 0.5 1.0 1.5
8
6
4
2
0
Tobit regr. of d-hat on z: beta-1-hat
N = 1000 Bandwidth = 0.02694
Den
sity
0.5 0.0 0.5 1.0 1.5
Truncated regr. of d-hat on z: beta-1-hat
N = 1000 Bandwidth = 0.1402
Den
sity
0.1 0.2 0.3 0.4 0.5 0.6 0.7
10
12
10
8
6
4
2
0
15
Tobit regr. of d-hat on z: beta-2-hat
N = 1000 Bandwidth = 0.0117
Den
sity
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Truncated regr. of dha-t on z: beta-2-hat
N = 1000 Bandwidth = 0.03124
Den
sity
0.6 0.7 0.8 0.9 1.0 1.1 1.2
Tobit regr. of d-hat on z: sigma-hat
N = 1000 Bandwidth = 0.0174
Den
sity
0.6 0.7 0.8 0.9 1.0 1.1 1.2
Truncated regr. of d-hat on z: sigma-hat
N = 1000 Bandwidth = 0.04011
Den
sity
8
6
4
2
0
5
0
10
15
5
0
12
10
8
6
4
2
0
Fig. 1. Estimates of sampling densities, Algorithm #1 (n ¼ 400; dotted curve for p ¼ q ¼ 1, solid curve for
p ¼ q ¼ 2).
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6448
estimated confidence intervals from the tobit model are so poor—none of theestimated values lie near the true values. The situation is much better for thetruncated regression; although the distributions for bb1 and bb2 show small amounts ofskewness, they are rather well-centered over the true values. Comparing the dottedand the solid curves reveals the effects of increasing dimensionality in the first-stageproblem; when dimensionality is increased from 2 to 4, the estimates of the samplingdensities in the second column of Fig. 1 become more disperse, and shift slightlyaway from the true parameter values. As discussed earlier, these effects are to beexpected since, with increasing dimensionality in the first stage, the dependentvariable in the second-stage regression contains more noise. It is interesting to note,however, that the doubling of dimensionality from 2 to 4 appears to have only asmall effect on the densities in Fig. 1.
A similar set of Monte Carlo experiments were performed to obtain the results inTable 2. In the case of columns 4–7 of Table 2, we employed the double bootstrap inAlgorithm #2. Comparison of these results with those for Algorithm #1 shown incolumns 4–7 of Table 1 reveal improved coverages in a number of cases. Given thatAlgorithm #2 involves only a small increase in computational burden overAlgorithm #1, the improved performance of Algorithm #2 seems to justify its use.Note that in Table 2, as in Table 1 with the single bootstrap, coverages worsen asðpþ qÞ increases for a given sample size, due to the curse of dimensionality in thefirst-stage estimation of the dependent variable. The worsening here, however, ismodest and less severe than in the case of Table 1 due to the bias correctionemployed in Algorithm #2.
The last four columns of Table 2 give coverage results for conventional inference
applied after the regression ofbbdi on zi in step [5] of Algorithm #2. As with Algorithm
#1 in Table 1, the coverages obtained with conventional inference here are broadlysimilar to those obtained with Algorithm #2, but the coverages provided byAlgorithm #2 are much better than those provided by Algorithm #1 in Table 1.However, with p ¼ q ¼ 3 and n ¼ 400, coverage for the intercept term (b1) is worsewith the conventional approach.
Kernel estimates of the densities of the estimators from the second set of experimentsare shown in Fig. 2, again for the case where n ¼ 400, with the dotted curvescorresponding to p ¼ q ¼ 1 and the solid curves corresponding to p ¼ q ¼ 2. As in Fig.1, the rows in Fig. 2 (from top to bottom) correspond to estimates of the intercept,slope, and s terms in the second stage. The first column in Fig. 2 shows results based onestimates from Algorithm #1; the densities are the same as those in the second columnof Fig. 1, but have been reproduced with different scalings on the vertical axes tofacilitate comparison with the second column of Fig. 2. The second column of Fig. 2
corresponds to estimates from the regression ofbbdi on zi using Algorithm #2. As in
Fig. 1, true parameter values are indicated by vertical dashed lines.Two phenomena become evident in Fig. 2. First, as noted several times already,
increasing dimensionality in the first-stage problem reduces the precision of estimatesin the second stage. Again, however, the results in Fig. 2 indicate that whendimensionality in the first-stage is doubled from 2 to 4 in our experiments, the effect
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 49
on the second-stage estimates is not great. Second, the densities in the first column ofFig. 2 have slightly greater skewness than corresponding densities in the second
column; on the whole, the densities for the regression ofbbdi are better-centered on the
true values than the densities for the regression of bdi.The second set of experiments conducted to produce the results in Table 2 also
allow comparison of the root-mean-square-error (RMSE) of each estimator in
regressions of bdi on zi versus regressions ofbbdi on zi. Table 3 shows the RMSE for
each estimator (computed over 1000 Monte Carlo trials) with varying modeldimensions and sample sizes. These results reveal two interesting phenomena.
First, for p ¼ q ¼ 1, 2, or 3 and a sample size of 100, RMSE for each estimator is
smaller when bdi is regressed on zi than whenbbdi is regressed on zi. However, when the
sample size is increased to 400, with p ¼ q ¼ 1 or 2, use ofbbdi yields lower RMSE for
the intercept and slope estimators than bdi. For p ¼ q ¼ 3, use ofbbdi yields lower
RMSE for the intercept and slope estimators than bdi when the sample size isincreased to 800. Hence, at smaller sample sizes, performing the bias correction in
ARTICLE IN PRESS
Table 2
Estimated coverages of confidence intervals from regression ofbbdi on zi
Trunc. regression Trunc. regression
p ¼ q n Param. Algorithm #2—nominal significance Conventional inference—nominal significance
0.80 0.90 0.95 0.99 0.80 0.90 0.95 0.99
1 100 b1 0.798 0.867 0.891 0.927 0.799 0.888 0.923 0.957
b2 0.781 0.876 0.911 0.946 0.790 0.882 0.919 0.963
se 0.805 0.884 0.906 0.956 0.816 0.879 0.922 0.964
1 400 b1 0.802 0.898 0.937 0.974 0.801 0.907 0.949 0.982
b2 0.804 0.900 0.944 0.980 0.804 0.899 0.950 0.989
se 0.818 0.898 0.946 0.980 0.818 0.898 0.947 0.986
2 100 b1 0.850 0.967 0.976 0.986 0.808 0.959 0.982 0.995
b2 0.803 0.902 0.933 0.963 0.792 0.907 0.945 0.976
se 0.812 0.923 0.946 0.969 0.817 0.919 0.954 0.983
2 400 b1 0.758 0.912 0.975 0.991 0.744 0.885 0.960 0.993
b2 0.795 0.907 0.960 0.986 0.794 0.904 0.953 0.990
se 0.781 0.912 0.964 0.993 0.789 0.904 0.957 0.995
3 100 b1 0.741 0.971 0.993 1.000 0.658 0.941 0.996 1.000
b2 0.793 0.907 0.934 0.968 0.821 0.923 0.960 0.980
se 0.820 0.925 0.951 0.972 0.859 0.946 0.964 0.984
3 400 b1 0.378 0.645 0.871 0.999 0.335 0.546 0.716 0.958
b2 0.739 0.878 0.961 0.995 0.730 0.846 0.944 0.993
se 0.708 0.881 0.961 0.999 0.712 0.863 0.944 0.996
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6450
Algorithm #2 worsens RMSE relative to the simpler, single bootstrap. However, fora given dimensionality, as sample size is increased, the bias correction eventuallybecomes advantageous in terms of RMSE. Recall that comparison of Tables 1 and 2revealed that, in terms of coverages of estimated confidence intervals, the double
ARTICLE IN PRESS
0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
1.5
Truncated regr. of d-hat on z: beta-1-hat
N = 1000 Bandwidth = 0.1402
Den
sity
0.5 0.0 0.5 1.0 1.5
Truncated regr. of dhat-hat on z: beta-1-hat
N = 1000 Bandwidth = 0.1533
Den
sity
0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
1
2
3
4
0
1
2
3
4
5
5
6
7
Truncated regr. of d-hat on z: beta-2-hat
N = 1000 Bandwidth = 0.03124
Den
sity
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Truncated regr. of dhat-hat on z: beta-2-hat
N = 1000 Bandwidth = 0.0335
Den
sity
0.6 0.8 1.0 1.2
6Truncated regr. of d-hat on z: sigma-hat
N = 1000 Bandwidth = 0.04011
Den
sity
0.6 0.8 1.0 1.2
Truncated regr. of dhat-hat on z: sigma-hat
N = 1000 Bandwidth = 0.04292
Den
sity
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
0.0
0.5
1.0
1.5
Fig. 2. Estimates of sampling distributions, Algorithm #2 (n ¼ 400; dotted curve for p ¼ q ¼ 1, solid
curve for p ¼ q ¼ 2).
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 51
bootstrap was found to be superior in almost every case. Although the biascorrection in the double bootstrap may increase estimation noise at small samplesizes, inference-making abilities are enhanced at all sample sizes.
Second, the results in Table 3 also reveal that for a given sample size (e.g., n ¼ 100or n ¼ 400), RMSE increases as ðpþ qÞ increases. As noted earlier in the discussionsof Tables 1 and 2, this is due to the curse of dimensionality in the first-stageestimation of the dependent variable used in the second-stage regressions. At a givensample size, with increased dimensionality, the dependent variable in the second-stage regression contains less information about the underlying, latent variablein (6).
To further examine the link between two-stage non-parametric models in theliterature and parametric models as discussed in Section 5, we conducted anadditional set of simulations using (26) as the true model, with gðxijaÞ ¼
Qpj¼1x
aj
ij ,aj ¼ 0:8p�1, se ¼ 0:9, sv ¼ 0, r ¼ 2, sz ¼ 1, mz ¼ �1, and b1 ¼ b2 ¼ 0:5. Weconsidered four cases with p 2 f1; 3g and n 2 f100; 400g. For each case, we simulated
ARTICLE IN PRESS
Table 3
Root-mean square error of parameter estimators in truncated regression of bdi;bbdi on zi
p ¼ q n Parameter RMSE
bdibbdi
1 100 b1 0.4985 0.5261
b2 0.1107 0.1118
se 0.1481 0.1446
1 400 b1 0.2259 0.2255
b2 0.0504 0.0497
se 0.0687 0.0656
2 100 b1 0.7812 0.9515
b2 0.1418 0.1498
se 0.1925 0.1886
2 400 b1 0.3121 0.3002
b2 0.0585 0.0563
se 0.0811 0.0775
3 100 b1 1.1762 1.8447
b2 0.1773 0.2288
se 0.2292 0.2536
3 400 b1 0.5071 0.6203
b2 0.0743 0.0795
se 0.0960 0.1061
3 800 b1 0.3558 0.3542
b2 0.0527 0.0498
se 0.0646 0.0797
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6452
draws from the model in each of 1000 Monte Carlo trials; on each trial, we obtainedmaximum likelihood estimates of the parameters and performed conventionalinference with the inverse negative Hessian of the log-likelihood function using thecomputer code described in Coelli (1996). In addition, we applied our Algorithms #1and #2 described previously in Section 3.10
Results for estimated coverages obtained with the three estimation methods, atnominal significance levels of 0.90, 0.95, and 0.99, are shown in Table 4. In obtainingmaximum likelihood estimates, we employed two standard transformations, settings2 ¼ s2e þ s2v and g ¼ s2e=ðs
2e þ s2vÞ; results for estimates of s and g are reported for
the maximum likelihood estimates. By design, Algorithms #1 and #2 restrict sv ¼ 0,but this restriction is not imposed in the maximum likelihood estimation to remainsimilar with the way practitioners are likely to use these methods.
The results in Table 4 reveal that in every case, coverages for b1, b2, and sobtained with Algorithms #1 and #2 are better (in a few cases, only slightly so, but inmany cases substantially so) than those obtained with maximum likelihood. Since sv
is not restricted to zero in the maximum likelihood estimation, some of theinefficiency in the data-generating process is confused with statistical noise. Notethat the coverages for the production function parameters (the a’s) is poor; therefore,it appears that the method has difficulty in estimating these parameters accurately,and this in turn may cause distortions in the estimates of b1 and b2 that are obtainedwith the maximum likelihood method. Of course, one might expect that themaximum likelihood coverages would improve if sv were restricted to zero, but thisis not done in practical applications where models such as (28) are estimated.11
Table 4 also reveals that coverages obtained with Algorithm #1 are better thanthose obtained with Algorithm #2 in these experiments. This may be due to thedifferent scaling that is used here as opposed to the simulations that led to Tables1–3. The bias correction in Algorithm #2 likely adds some noise unless the bias beingcorrected is large. Recall that this was confirmed in Table 3 in the case of the originalexperiments, and so it should be no surprise that one can find a scaling as we havehere, where Algorithm #1 is able to dominate Algorithm #2 in terms of coverages ofestimated confidence intervals.
7. Empirical examples
As a final exercise, we provide an empirical example based on the paper by Alyet al. (1990). Aly et al. examined efficiency among a subsample of 322 commercialbanks operating in the U.S. during the fourth quarter of 1986. Here, we usethe definitions of Berger and Mester (2003) to define three inputs (purchased
ARTICLE IN PRESS
10Note that steps [3.1] and [3.2] in Algorithm #1 and steps [3.1], [3.2] and [6.1], [6.2] must be modified
slightly due to the exponential specification introduced by Assumptions A2a and A3a in Section 5.11One might also reasonably argue that if, in the true model, sv40, then the comparison across methods
in Table 4 might be different. But in this case, the first stages of the non-parametric approaches would no
longer be statistically consistent. Here, we have generated the data in such a way that each of the
estimation methods remains consistent.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 53
funds, core deposits, and labor) and four outputs (consumer loans, business loans,real estate loans, and securities held) for banks; these are used to estimate technicalefficiency in the first stage. In the second stage, we attempt to explain the first-stageestimates in terms of several variables used by Aly et al. in their second-stageregression. Where Aly et al. employed OLS in their second stage, we use truncatedregression.
The covariates we use in the second-stage regression are similar to those used byAly et al., except that our SIZE variable is defined by the log of total assets ratherthan total deposits as in the original study, and we also include the square of SIZEand DIVERSE as well as an interaction term. In addition, we included bothindependent banks and banks that are members of multi-bank holding companies,
ARTICLE IN PRESS
Table 4
Coverage of estimated confidence intervals in parametric model
p n Parameter MLE—nom. signif. Alg. #1—nom. signif. Alg. #2—nom. signif.
0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99
1 100 a1 0.630 0.663 0.711 — — — — — —
a2 0.633 0.669 0.711 — — — — — —
b1 0.781 0.838 0.900 0.867 0.900 0.942 0.815 0.847 0.903
b2 0.808 0.864 0.930 0.873 0.915 0.961 0.858 0.899 0.955
s2 0.770 0.816 0.890 0.838 0.873 0.917 0.825 0.863 0.902
g 0.998 0.998 0.999 — — — — — —
1 400 a1 0.589 0.643 0.697 — — — — — —
a2 0.582 0.631 0.693 — — — — — —
b1 0.703 0.767 0.849 0.915 0.933 0.963 0.886 0.907 0.951
b2 0.724 0.803 0.886 0.902 0.932 0.972 0.895 0.928 0.970
s2 0.653 0.722 0.818 0.897 0.931 0.962 0.888 0.921 0.956
g 0.999 1.000 1.000 — — — — — —
3 100 a1 0.636 0.688 0.739 — — — — — —
a2 0.661 0.698 0.755 — — — — — —
a3 0.667 0.718 0.772 — — — — — —
a4 0.661 0.710 0.769 — — — — — —
b1 0.776 0.829 0.906 0.909 0.929 0.963 0.884 0.916 0.953
b2 0.783 0.834 0.916 0.883 0.919 0.965 0.839 0.881 0.936
s2 0.718 0.775 0.867 0.840 0.877 0.906 0.804 0.831 0.878
g 1.000 1.000 1.000 — — — — — —
3 400 a1 0.594 0.635 0.691 — — — — — —
a2 0.596 0.650 0.708 — — — — — —
a3 0.609 0.665 0.717 — — — — — —
a4 0.621 0.671 0.731 — — — — — —
b1 0.723 0.785 0.869 0.951 0.966 0.987 0.849 0.893 0.945
b2 0.720 0.792 0.881 0.892 0.930 0.981 0.817 0.872 0.946
s2 0.633 0.712 0.817 0.895 0.925 0.963 0.811 0.858 0.923
g 0.998 1.000 1.000 — — — — — —
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6454
and added a dummy variable (HOLD) equal to 1 if a bank is a member of a multi-bank holding company. See Aly et al. (1990) for definitions of the remainingvariables used in the second-stage regression.
We used more recent data, from the fourth-quarter FDIC Reports of Income andCondition (Call Reports) for 2002. We first took a random sample of 322 banks, asdid Aly et al. in their study. We also present results for the full sample (after deletingobservations with missing or implausible values) consisting of 6955 observations.
We first regressed the DEA efficiency estimates on our covariates to obtain theparameter estimates shown in the second column of Table 5. We then estimated 95%confidence intervals using the asymptotic normal approximation (see columns 3and 4) and our Algorithm #1 (see columns 5 and 6). Using Algorithm #2, we
regressedbbd on the covariates to obtain the parameter estimates shown in column 7 of
Table 5 and the confidence interval estimates in columns 8 and 9, also obtained withAlgorithm #2.
Several things become apparent from examining the results in Table 5. First, boththe parameter estimates and confidence-interval estimates obtained by regressing thebias-corrected efficiencies on covariates in Algorithm #2 are somewhat differentfrom those obtained from regressing the uncorrected estimates as in Algorithm #1.Given the Monte Carlo simulation results discussed earlier, this is not surprising, andthe simulation results suggest that we should prefer the results from Algorithm #2over those from Algorithm #1. The confidence interval estimates from eitherAlgorithm #1 or #2 are rather different from interval estimates obtained usingconventional methods, and this too is not surprising given the simulation resultsdiscussed earlier.
The conventional confidence interval estimates in columns 3–4 of Table 5 arecentered on the parameter estimates in column 2 by construction. It is interesting tonote that the intervals estimated with both Algorithms #1 and #2 sometimes do notcover the corresponding parameter estimate. In particular, with n ¼ 322, Algorithm#2 produces estimated confidence intervals in the last two columns of Table 5 that donot cover the corresponding parameter estimates in the seventh column in four cases(i.e., the coefficients on SIZE, SIZE_SQ, DIVERSE_SQ, and the estimate bse). Whenthe sample size is increased to 6955, the estimated confidence intervals fromAlgorithm #2 also do not cover the corresponding parameter estimate in four cases(HOLD, MSA, DIVERSE_SQ, and bse).12
Unlike the conventional confidence intervals based on the normal approximation,the bootstrap confidence intervals incorporate an implicit bias correction. Recall thatthe second-stage regression inherits the convergence rate of the DEA estimator in thefirst stage, i.e., n�2=ðpþqþ1Þ or n�2=9 in this application. Noting that 6955�2=9 � 51�1=2,
ARTICLE IN PRESS
12The confidence intervals estimated by Algorithms #1 and #2 in Table 5 are typically more narrow than
the corresponding interval estimates obtained by the usual normal approximation. In our Monte Carlo
experiments reported in Tables 1 and 2, the widths of estimated intervals were similar across the various
methods (with the exception of cases where tobit regression was used). The fact that the bootstrap intervals
in Table 5 are more narrow than the conventional intervals may reflect some unknown feature in the data
different from our simulation scenario.
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 55
ARTICLE IN PRESS
Table
5
Efficiency
ofU.S.commercialbanks(truncatedregression,95%
confidence
intervals,
p¼
3,
q¼
5)
b bNorm
alCI
Alg.#1CI
b bAlg.#2CI
lohi
lohi
lohi
n¼
322:
CONSTANT
0.6759
0.5234
0.8284
0.2841
0.4517
�0.2354
�1.354
�0.8535
SIZ
E0.0707
0.04904
0.09237
0.1047
0.1285
0.2453
0.3414
0.4117
DIV
ERSE
0.4293
0.323
0.5356
0.4044
0.5277
1.121
1.101
1.472
HOLD
0.01601
0.01194
0.02008
0.01468
0.01961
0.03738
0.03482
0.04967
MSA
0.006988
0.003636
0.01034
0.005876
0.009998
0.01897
0.01607
0.02883
SIZ
E_SQ
�0.003864
�0.004704
�0.003024
�0.006503
�0.005592
�0.01263
�0.02031
�0.01754
DIV
ERSE_SQ
�0.1181
�0.147
�0.08915
�0.158
�0.1239
�0.3536
�0.4928
�0.3906
SIZ
E�
DIV
ERSE
�0.01079
�0.01828
�0.003307
�0.01399
�0.00533
�0.02117
�0.0308
�0.00527
b s �0.06071
0.0596
0.06181
0.08192
0.08342
0.1673
0.2167
0.221
n¼
6955:
CONSTANT
�0.08591
�1.34
1.168
�1.228
�0.1798
�0.6978
�3.682
�0.003114
SIZ
E0.1477
�0.03172
0.3272
0.1493
0.3012
0.2291
0.1318
0.6233
DIV
ERSE
0.7302
�0.01542
1.476
0.7284
1.366
1.435
1.053
3.16
HOLD
0.02273
0.002241
0.04322
0.02294
0.04091
0.07018
0.07331
0.1333
MSA
0.01961
0.005769
0.03345
0.01934
0.03348
0.06714
0.07296
0.1168
SIZ
E_SQ
�0.006839
�0.01365
�2.346E�5
�0.0135
�0.007633
�0.01308
�0.03042
�0.01264
DIV
ERSE_SQ
�0.3357
�0.5252
�0.1463
�0.569
�0.4102
�0.8587
�1.571
�1.045
SIZ
E�
DIV
ERSE
�0.00636
�0.05713
0.04441
�0.02932
0.01443
0.02325
�0.02937
0.117
b s �0.03802
0.03225
0.0438
0.04996
0.05559
0.1106
0.1436
0.1599
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6456
our inference-making ability in the second-stage regression with 6955 observations isequivalent, in a rough sense, to making inference in an ordinary parametric,truncated regression (where the classical parametric convergence rate n�1=2 obtains)with about 51 observations. It is well known that maximum likelihood oftenproduces biased estimates in finite samples. Although we expect the procedure hereto be unbiased asymptotically, even with 6955 observations, we are far from theasymptotic result. And, of course, in any application such as this, the true DGP isunknown, and possibly different from the one that is assumed.
Finally, we note that there are some differences in parameter estimates obtainedwith the sub-sample and those obtained with the full sample. On the whole, however,the estimates appear rather stable. Of course, more data are always preferred to lessdata.
8. Summary and conclusions
The Monte Carlo results presented in the previous assumption illustrate a numberof problems with existing two-stage studies that employ non-parametric distancefunction estimators similar to (10) in the first stage, and then employ tobit regressionin the second stage with conventional inference based on the inverse negativeHessian of the log-likelihood function. As noted in Section 1, none of the publishedstudies that we are aware of have defined a DGP that might be estimated, and it isdifficult to imagine one where tobit regression would be sensible in this context. Interms of coverage of estimated confidence intervals, tobit regression is catastrophicin our Monte Carlo experiments.
Truncated regression estimates the correct model in our experiments. In terms ofcoverage of estimated confidence intervals, our single bootstrap is shown to performwell, but our double bootstrap performs even better, with little increase incomputational burden over the single bootstrap. The double bootstrap has theadditional advantage that RMSE of the intercept and slope estimators in the second-stage regression declines more rapidly with increasing sample size than when thesingle bootstrap is used, resulting in lower RMSE at moderate sample sizesdepending on model dimensionality in the first stage. The double bootstrap is thusour preferred choice, but one could use both methods as a robustness check.
Acknowledgements
We are grateful to Irene Gijbels and participants at the XXXIVemes Journees deStatistique, Societe Franc-aise de Statistique, Bruxelles, the North AmericanProductivity Workshop, Albany, New York; the Workshop on QuantitativeMethods for the Measurement of Organizational Efficiency, Institute for FiscalStudies, University College, London, and the Econometric Society EuropeanMeetings, Madrid, for comments on earlier versions. Research support from ‘‘Projetd’Actions de Recherche Concertees’’ (No. 98/03-217), from the ‘‘Inter-university
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 57
Attraction Pole’’, Phase V (No. P5/24) from the Belgian Government, and from theTexas Advanced Computation Center is gratefully acknowledged. Any remainingerrors are solely our responsibility.
Appendix A. Technical details
A.1. Censored versus truncated regression
Consider the regression model
Wi ¼ zibþ ei, (27)
where eiNð0;s2e Þ is identically, independently distributed for all i ¼ 1; . . . ; n. Theleft-hand side variable W is said to be censored when, instead of observing Wi for allobservations, we observe
yi ¼zi þ ei if zi þ ei4ci
ci otherwise.
((28)
In this case, W is left-censored at the constant ci, which may vary over observations.Alternatively, Wi is said to be truncated if we observe yi ¼ Wi for all WiXci, butobserve nothing otherwise.
Relative to the classical linear regression model with unbounded error terms,both censoring and truncation involve a loss of information about the depen-dent variable. In the case of censoring, some information is lost about the depen-dent variable for censored observations, but the right-hand side variables areobserved for such observations. In the case of truncation, neither the left-handnor the right-hand side variables are observed for some observations. Theinformation loss is thus more severe in the case of truncation than in the case ofcensoring.
In the case of truncation, if the Wi are assumed normal with left-truncation at ci, b
in (27) can be estimated by maximizing the likelihood function
L1 ¼Yn
i¼1
1
sef
yi � zib
se
� �1� F
ci � zib
se
� �� ��1, (29)
where fð�Þ and Fð�Þ represent the standard normal density and distributionfunctions, respectively. This resembles the likelihood for regression models withnormal errors that are neither censored nor truncated, except for the term in squarebrackets. Division by this term is necessary to re-scale the normal density fð�Þ so thatit integrates to unity after truncation.
In the case of left-censoring at ci where one assumes that the uncensoredobservations are normally distributed, b in (27) can be estimated by specifying afunctional form for ProbðWi4cijaÞ, where a is a vector of parameters, and writing the
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6458
likelihood function as
L2 ¼Y
ijyi4ci
1
sef
yi � zib
se
� �1� F
ci � zib
se
� �� ��1� ProbðWi4cijaÞ
�Y
ijyi¼ci
½1� ProbðWi4cijaÞ�. ð30Þ
Here, the probability that Wi4ci is allowed to result from a process different from theone that determines W when Wi4ci. The probability ProbðWi4cijaÞ would typically bespecified as probit or logit, but many other specifications are possible.
The tobit model is a special case of (30), where the probability that Wi4ci isassumed to be controlled by the same process that determines W when Wi4ci. In otherwords, the tobit model incorporates the additional assumption that
ProbðWi4ciÞ ¼ 1� Fci � zib
se
� �� �. (31)
Substitution of the expression in (31) for ProbðWi4cijaÞ in (30) yields the tobitlikelihood function,
L3 ¼Y
ijyi4ci
1
sef
yi � zib
se
� � Yijyi¼ci
Fci � zib
se
� �. (32)
The terms in the second product expression reflect the fact that all that is knownabout the censored observations is that Wi lies below ci.
Even if the censored model was the correct specification, use of the tobit version ofthis model would be curious in the context of the two-stage models discussed herein.Note that the likelihood L2 in (30) is multiplicatively separable in b and a. Both thecensored as well as the uncensored observations identify a, while only the uncensoredobservations identify b. In fact, the separability means that L2 can be maximizedwith respect to each parameter vector independently; maximization with respect to b
is equivalent to maximization of the likelihood L1 in (29) for the truncated model.By contrast, when the tobit specification is used, both the censored as well as theuncensored observations determine bb when the likelihoodL3 in (32) is maximized. Abetter approach would be to specify ProbðWi4cijaÞ as a probit probability, maximizeboth L2 and L3, and perform a likelihood-ratio test of the null hypothesis b ¼ a.None of the two-stage studies that have used the tobit specification in the secondstage that we are aware of have done this. Of course, the censored model is not thecorrect specification, so perhaps the point is moot.
Both the truncated and the tobit regression models can easily be estimated bymaximum likelihood. In either case, the task is made easier when Olsen’s (1978) re-parameterization is used. With this re-parameterization, maximization by Newton’smethod typically converges very quickly, requiring only a few iterations. For thepractitioner, commands that perform truncated regression are available in a numberof popular software packages.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 59
A.2. Obtaining draws from a left-truncated normal distribution
Both bootstrap algorithms (Algorithm #1 and #2), as well as the simulationalgorithm in Section 5 (Algorithm #3), require iid draws from a Nð0;s2Þ distributionwith left truncation at a constant c. This can be accomplished quickly and easilyusing a modified transformation method. Let Fð�Þ and F�1ð�Þ denote the standardnormal distribution function and the standard normal quantile function, respec-tively, so that u ¼ F�1ðFðuÞÞ. Generate v be uniform on ð0; 1Þ, let c0 ¼ c=s, and setv0 ¼ Fðc0Þ þ ½1� Fðc0Þ�v. Then compute u ¼ sF�1ðv0Þ to obtain the desired left-truncated normal deviate.
References
Adams, R.M., Berger, A.N., Sickles, R.C., 1999. Semiparametric approaches to stochastic panel frontiers
with applications in the banking industry. Journal of Business and Economic Statistics 17, 349–358.
Aigner, D.J., Lovell, C.A.K., Schmidt, P., 1977. Formulation and estimation of stochastic frontier
production function models. Journal of Econometrics 6, 21–37.
Aly, H.Y., Grabowski, R., Pasurka, C., Rangan, N., 1990. Technical, scale, and allocative efficiencies in
U.S. banking: an empirical investigation. Review of Economics and Statistics 72, 211–218.
Andersen, P., Petersen, N.C., 1993. A procedure for ranking efficient units in Data Envelopment Analysis.
Management Science 39, 1261–1264.
Arnold, V.L., Bardhan, I.R., Cooper, W.W., Kumbhakar, S.C., 1996. New uses of DEA and statistical
regressions for efficiency and estimation: Texas schools. Annals of Operations Research 66, 255–277.
Atkinson, S.E., Primont, D., 2002. Stochastic estimation of firm technology, inefficiency, and productivity
growth using shadow cost and distance functions. Journal of Econometrics 108, 203–225.
Banker, R.D., Johnston, H.H., 1994. Evaluating the impacts of operating strategies on efficiency in the US
airline industry. In: Charnes, A., Cooper, W.W., Levin, A.Y., Seiford, L.M. (Eds.), Data Envelopment
Analysis: Theory Methodology and Application. Kluwer Academic Publishers, Inc., Boston,
pp. 97–128.
Barros, C.P., 2004. Measuring performance in defence-sector companies in a small NATO member-
country. Journal of Economic Studies 31, 112–128.
Battese, G.E., Coelli, T.J., 1995. A model for technical inefficiency effects in a stochastic production
function for panel data. Empirical Economics 20, 325–332.
Berger, A.N., Mester, L.J., 2003. Explaining the dramatic changes in performance of US banks:
technological change deregulation and dynamic changes in competition. Journal of Financial
Intermediation 12, 57–95.
Bickel, P.J., Freedman, D.A., 1981. Some asymptotic theory for the bootstrap. Annals of Statistics 9,
1196–1217.
Binam, J.N., Sylla, K., Diarra, I., Nyambi, G., 2003. Factors affecting technical efficiency among coffee
farmers in Cote d’Ivoire: evidence from the centre west region. R&D Management 15, 66–76.
Burgess, J.F., Wilson, P.W., 1998. Variation in inefficiency among US hospitals. Canadian Journal of
Operational Research and Information Processing (INFOR) 36, 84–102.
Byrnes, P., Fare, R., Grosskopf, S., Lovell, C.A.K., 1988. The effect of unions on productivity: U.S.
surface mining of coal. Management Science 34, 1037–1053.
Carrington, R., Puthucheary, N., Rose, D., Yaisawarng, S., 1997. Performance measurement in
government service provision: the case of police services in New South Wales. Journal of Productivity
Analysis 8, 415–430.
Chakraborty, K., Biswas, B., Lewis, W.C., 2001. Measurement of technical efficiency in public education:
a stochastic and nonstochastic production approach. Southern Economic Journal 67, 889–905.
Chalfant, J.A., Gallant, A.R., 1985. Estimating substitution elasticities with the Fourier cost function.
Journal of Econometrics 28, 205–222.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6460
Charnes, A., Cooper, W.W., Rhodes, E., 1978. Measuring the inefficiency of decision making units.
European Journal of Operational Research 2, 429–444.
Charnes, A., Cooper, W.W., Rhodes, E., 1979. Measuring the efficiency of decision making units.
European Journal of Operational Research 3, 339.
Cheng, T.W., Wang, K.L., Weng, C.C., 2000. A study of technical efficiencies of CPA firms in Taiwan.
Review of Pacific Basin Financial Markets and Policies 3, 27–44.
Chilingerian, J.A., 1995. Evaluating physician efficiency in hospitals: a multivariate analysis of best
practices. European Journal of Operational Research 80, 548–574.
Chilingerian, J.A., Sherman, H.D., 2004. Health care applications: from hospitals to physicians from
productive efficiency to quality frontiers. In: Cooper, W.W., Seiford, L.M., Zhu, J. (Eds.), Handbook
on Data Envelopment Analysis. Kluwer Academic Publishers, Boston, pp. 265–298 (Chapter 10).
Chirkos, T.N., Sears, A.M., 1994. Technical efficiency and the competitive behavior of hospitals. Socio-
Economic Planning Science 28, 219–227.
Chu, H.L., Liu, S.Z., Romeis, J.C., Yaung, C.L., 2003. The initial effects of physician compensation
programs in Taiwan hospitals: implications for staff model HMOs. Health Care Management Science
6, 17–26.
Coelli, T. 1996. A guide to FRONTIER version 4.1: a computer program for stochastic frontier
production and cost function estimation. Unpublished Working Paper, Department of Econometrics,
University of New England, Armidale, Australia
Coelli, T., 2000. On the econometric estimation of the distance function representation of a production
technology. Discussion Paper No. 00/42, Center for Operations Research and Econometrics (CORE),
Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.
Coelli, T., Perelman, S., 2001. Medicion de la Eficiencia Tcnica en Contextos Multiproducto. In: Alvarez
Pinilla, A. (Ed.), La Medicion de la Eficiencia Productividad. Editorial Piramide, Madrid.
Coelli, T., Rao, D.S.P., Battese, G.E., 1998. An Introduction to Efficiency and Productivity Analysis.
Kluwer Academic Publishers, Inc., Boston.
De Borger, B., Kerstens, K., 1996. Cost efficiency of Belgian local governments: a comparative analysis of
FDH DEA and econometric approaches. Regional Science and Urban Economics 26, 145–170.
Dietsch, M., Weill, L., 1999. Les performances des banques de depots franc-aises: une evaluation par la
method DEA. In: Badillo, P.Y., Paradi, J.C. (Eds.), La Method DEA. Hermes Science Publications,
Paris.
Dusanksy, R., Wilson, P.W., 1994. Technical efficiency in the decentralized care of the developmentally
disabled. Review of Economics and Statistics 76, 340–345.
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall, London.
Fare, R., 1988. Fundamentals of Production Theory. Springer, Berlin.
Fare, R., Grosskopf, S., Lovell, C.A.K., 1985. The Measurement of Efficiency of Production. Kluwer-
Nijhoff Publishing, Boston.
Farrell, M.J., 1957. The measurement of productive efficiency. Journal of the Royal Statistical Society,
Series A 120, 253–281.
Fried, H.O., Lovell, C.A.K., Vanden Eeckaut, P., 1993. Evaluating the performance of U.S. Credit
Unions. Journal of Banking and Finance 17, 251–265.
Fried, H.O., Lovell, C.A.K., Yaisawarng, S., 1999a. The impact of mergers on credit union service
provision. Journal of Banking and Finance 23, 367–386.
Fried, H.O., Schmidt, S.S., Yaisawarng, S., 1999b. Incorporating the operating environment into a
nonparametric measure of technical efficiency. Journal of Productivity Analysis 12, 249–267.
Fried, H.O., Lovell, C.A.K., Schmidt, S.S., Yaisawarng, S., 2002. Accounting for environmental effects
and statistical noise in data envelopment analysis. Journal of Productivity Analysis 17, 157–174.
Garden, K.A., Ralston, D.E., 1999. The x-efficiency and allocative efficiency effects of credit union
mergers. Journal of International Financial Markets, Institutions and Money 9, 285–301.
Gattoufi, S., Oral, M., Reisman, A., 2004. Data envelopment analysis literature: a bibliography update
(1951–2001). Socio-Economic Planning Sciences 38, 159–229.
Gijbels, I., Mammen, E., Park, B.U., Simar, L., 1999. On estimation of monotone and concave frontier
functions. Journal of the American Statistical Association 94, 220–228.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 61
Gillen, D., Lall, A., 1997. Developing measures of airport productivity and performance: an application of
data envelopment analysis. Transportation Research Part E 33, 261–272.
Gonzalez, B., Barber, P., 1996. Changes in the efficiency of Spanish public hospitals after the introduction
of program-contracts. Investigaciones Economicas 20, 377–402.
Guilkey, D.K., Lovell, C.A.K., Sickles, R.C., 1983. A comparison of the performance of three flexible
functional forms. International Economic Review 24, 591–616.
Hall, P., 1986. On the number of bootstrap simulations required to construct a confidence interval. The
Annals of Statistics 14, 1453–1462.
Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer, New York.
Hirschberg, J.G., Lloyd, P.J., 2002. Does the technology of foreign-invested enterprises spill over to other
enterprises in China? An application of post-DEA bootstrap regression analysis. In: Lloyd, P.J., Zang,
X.G. (Eds.), Modelling the Chinese Economy. Edward Elgar Press, London.
Honore, B.E., Powell, J.L., 1994. Pairwise difference estimators of censored and truncated regression
models. Journal of Econometrics 64, 241–278.
Huang, C.J., Liu, J.T., 1994. Estimation of a non-neutral stochastic frontier production function. Journal
of Productivity Analysis 5, 171–180.
Isik, I., Hassan, M.K., 2002. Technical, scale, and allocative efficiencies of Turkish banking industry.
Journal of Banking and Finance 26, 719–766.
Kaligajan, K.P., Shand, R.T., 1985. Types of education and agricultural productivity. Journal of
Development Studies 21, 222–245.
Kirjavainen, T., Loikkanen, H.A., 1998. Efficiency differences of Finnish senior secondary schools: an
application of DEA and tobit analysis. Economics of Education Review 17, 377–394.
Kneip, A., Park, B.U., Simar, L., 1998. A note on the convergence of nonparametric DEA estimators for
production efficiency scores. Econometric Theory 14, 783–793.
Kneip, A., Simar, L., Wilson, P.W., 2003. Asymptotics for DEA estimators in non-parametric frontier
models, Discussion Paper No. 0317, Institut de Statistique, Universite Catholique de Louvain,
Louvain-la-Neuve, Belgium.
Kooreman, P., 1994. Nursing home care in The Netherlands: a nonparametric efficiency analysis. Journal
of Health Economics 13, 301–316.
Korostelev, A., Simar, L., Tsybakov, A.B., 1995. On estimation of monotone and convex boundaries.
Publications des Instituts de Statistique des Universites de Paris 39, 3–18.
Kumbhakar, S.C., Ghosh, S., McGuckin, J.T., 1991. A generalized production frontier approach for
estimating determinants of inefficiency in U.S. dairy farms. Journal of Business and Economic
Statistics 9, 279–286.
Lewbel, A., Linton, O., 2002. Nonparametric censored and truncated regression. Econometrica 70, 765–779.
Lovell, C.A.K., Walters, L.C., Wood, L.L., 1994. Stratified models of education production using
modified DEA and regression analysis. In: Charnes, A., Cooper, W.W., Lewin, A.Y., Seiford, L.M.
(Eds.), Data Envelopment Analysis: Theory, Methodology, and Applications. Kluwer Academic
Publishers, Boston.
Luoma, K., Jarvio, M.-L., Suoniemi, I., Hjerppe, R.T., 1996. Financial incentives and productive
efficiency in Finnish health services. Health Economics 5, 435–445.
Maddala, G.S., 1988. Introduction to Econometrics. Macmillan Publishing Co., Inc, New York.
McCarty, T.A., Yaisawarng, S., 1993. Technical efficiency in New Jersey school districts. In: Fried, H.O.,
Lovell, C.A.K., Schmidt, S.S. (Eds.), The Measurement of Productive Efficiency. Oxford University
Press, New York.
McMillan, M.L., Datta, D., 1998. The relative efficiencies of Canadian universitites: a DEA perspective.
Canadian Public Policy—Analyse de Politiques 24, 485–511.
Meeusen, W., van den Broeck, J., 1977. Efficiency estimation from Cobb–Douglas production functions
with composed error. International Economic Review 18, 435–444.
Mukherjee, K., Ray, S.C., Miller, S.M., 2001. Productivity growth in large US commercial banks: the
initial post-deregulation experience. Journal of Banking and Finance 25, 913–939.
Nyman, J.A., Bricker, D.L., 1989. Profit incentives and technical efficiency in the production of nursing
home care. Review of Economics and Statistics 71, 586–594.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6462
O’Donnell, C.J., van der Westhuizen, G., 2002. Regional comparisons of banking performance in South
Africa. South African Journal of Economics 70, 485–518.
Okeahalam, C.C., 2004. Foreign ownership, performance and efficiency in teh banking sector in Uganda
and Botswana. Journal for Studies in Economics and Econometrics 28, 89–118.
Olsen, R., 1978. A note on the uniqueness of the maximum likelihood estimator in the tobit model.
Econometrica 46, 1211–1215.
Otsuki, T., Hardle, I.W., Reis, E.J., 2002. The implications of property rights for joint agriculture-timber
productivity in the Brazilian Amazon. Environment and Development Economics 7, 299–323.
Pitt, M.M., Lee, L.F., 1981. The measurement and sources of technical inefficiency in the Indonesian
weaving industry. Journal of Development Economics 9, 43–64.
Puig-Junoy, J., 1998. Technical efficiency in the clinical management of critically ill patients. Health
Economics 7, 263–277.
Raczka, J., 2001. Explaining the performance of heat plants in Poland. Energy Economics 23,
355–370.
Ralston, D., Wright, A., Garden, K., 2001. Can mergers ensure the survival of credit unions in the third
millennium? Journal of Banking and Finance 25, 2277–2304.
Ray, S.C., 1988. Data envelopment analysis nondiscretionary inputs and efficiency: an alternative
interpretation. Socio-Economic Planning Science 22, 167–176.
Ray, S.C., 1991. Resource-use efficiency in public schools: a study of Connecticut data. Management
Science 37, 1620–1628.
Ray, S.C., 2004. Data Envelopment Analysis: Theory and Techniques for Economics and Operations
Research. Cambridge University Press, Cambridge.
Reifshneider, D., Stevenson, R., 1991. Systematic departures from the frontier: a framework for the
analysis of firm inefficiency. International Economic Review 32, 715–723.
Resende, M., 2000. Regulatory regimes and efficiency in US local telephony. Oxford Economic Papers 52,
447–470.
Rhodes, E.L., Southwick Jr., L., 1993. Variations in public and private university efficiency. In: Rhodes,
E.L. (Ed.), Applications of Management Science, vol. 7. JAI Press, Inc., Greenwich, CT.
Rosko, M.D., Chilingerian, J.A., Zinn, J.S., Aaronson, W.E., 1995. The effects of ownership operating
environment, and strategic choices on nursing home efficiency. Medical Care 33, 1001–1021.
Ruggiero, J., 2004. Performance evaluation in education: modeling educational production. In: Cooper,
W.W., Seiford, L.M., Zhu, J. (Eds.), Handbook on Data Envelopment Analysis. Kluwer Academic
Publishers, Boston, pp. 265–298 (Chapter 10).
Sexton, T.R., Sleeper, S., Taggart Jr., R.E., 1994. Improving pupil transportation in North Carolina.
Interfaces 24, 87–103.
Sheather, S.J., Jones, M.C., 1991. A reliable data-based bandwidth selection method for kernel density
estimation. Journal of the Royal Statistical Society B 53, 684–690.
Shephard, R.W., 1970. Theory of Cost and Production Function. Princeton University Press, Princeton.
Sickles, R.C., Good, D.H., Getachew, L., 2002. Specification of distance functions using semi- and
nonparametric methods with an application to the dynamic performance of eastern and western
European air carriers. Journal of Productivity Analysis 17, 133–155.
Simar, L., Wilson, P.W., 1998. Sensitivity analysis of efficiency scores: how to bootstrap in nonparametric
frontier models. Management Science 44 (11), 49–61.
Simar, L., Wilson, P.W., 1999a. Some problems with the Ferrier/Hirschberg bootstrap idea. Journal of
Productivity Analysis 11, 67–80.
Simar, L., Wilson, P.W., 1999b. Of course we can bootstrap DEA scores! But does it mean anything?
Logic trumps wishful thinking. Journal of Productivity Analysis 11, 93–97.
Simar, L., Wilson, P.W., 2000a. A general methodology for bootstrapping in nonparametric frontier
models. Journal of Applied Statistics 27, 779–802.
Simar, L., Wilson, P.W., 2000b. Statistical inference in nonparametric frontier models: the state of the art.
Journal of Productivity Analysis 13, 49–78.
Simar, L., Wilson, P.W., 2001a. Testing restrictions in nonparametric efficiency models. Communications
in Statistics 30, 159–184.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 63
Simar, L., Wilson, P.W., 2001b. Aplicacion del bootstrap para estimadores D.E.A. In: Alvarez Pinilla, A.
(Ed.), La Medicion de la Eficiencia y la Productividad. Piramide, Madrid.
Simar, L., Wilson, P.W., 2004. Performance of the bootstrap for DEA estimators and iterating the
principle. In: Cooper, W.W., Seiford, L.M., Zhu, J. (Eds.), Handbook on Data Envelopment Analysis.
Kluwer Academic Publishers, Boston, pp. 265–298 (Chapter 10).
Stanton, K.R., 2002. Trends in relationship lending and factors affecting relationship lending efficiency.
Journal of Banking and Finance 26, 127–152.
Turner, H., Windle, R., Dresner, M., 2004. North American containerport productivity: 1984–1997.
Transportation Research Part E 40, 339–356.
Wang, K.L., Tseng, Y.T., Weng, C.C., 2003. A study of production efficiencies of integrated securities
firms in Taiwan. Applied Financial Economics 13, 159–167.
Wheelock, D.C., Wilson, P.W., 2000. Why do banks disappear? The determinants of US bank failures and
acquisitions. Review of Economics and Statistics 82, 127–138.
Wheelock, D.C., Wilson, P.W., 2001. New evidence on returns to scale and product mix among US
commercial banks. Journal of Monetary Economics 47, 653–674.
Wilson, P.W., 2003. Testing independence in models of productive efficiency. Journal of Productivity
Analysis 20, 361–390.
Wilson, P.W., Carey, K., 2004. Nonparametric analysis of returns to scale and product mix among U.S.
hospitals. Journal of Applied Econometrics 19, 505–524.
Worthington, A.C., Dollery, B.E., 2000. Productive efficiency and the Australian local government grants
process. Australian Journal of Regional Studies 6, 95–121.
Wu, C.F.J., 1986. Jackknife bootstrap and other resampling methods in regression analysis. Annals of
Statistics 14, 1261–1295.
Xue, M., Harker, P.T., 1999. Overcoming the inherent dependency of DEA efficiency scores: a bootstrap
approach. Unpublished Working Paper, Wharton Financial Institutions Center, University of
Pennsylvania.
ARTICLE IN PRESS
L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6464