Estimation and inference in two-stage, semi-parametric...

Journal of Econometrics 136 (2007) 31–64

Estimation and inference in two-stage,semi-parametric models of production processes

Leopold Simara, Paul W. Wilsonb,�

aInstitut de Statistique, Universite Catholique de Louvain, Voie du Roman Pays 20,

Louvain-la-Neuve, BelgiumbDepartment of Economics, University of Texas, Austin, TX 78712, USA

Available online 9 September 2005

Abstract

Many papers have regressed non-parametric estimates of productive efficiency on

environmental variables in two-stage procedures to account for exogenous factors that might

affect firms’ performance. None of these have described a coherent data-generating process

(DGP). Moreover, conventional approaches to inference employed in these papers are invalid

due to complicated, unknown serial correlation among the estimated efficiencies. We first

describe a sensible DGP for such models. We propose single and double bootstrap procedures;

both permit valid inference, and the double bootstrap procedure improves statistical efficiency

in the second-stage regression. We examine the statistical performance of our estimators using

Monte Carlo experiments.

r 2005 Elsevier B.V. All rights reserved.

JEL classification: C1; C44; C61

Keywords: Data envelopment analysis; DEA; Bootstrap; Technical efficiency; Nonparametric; Two-stage

estimation

ARTICLE IN PRESS

www.elsevier.com/locate/jeconom

0304-4076/$ - see front matter r 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.jeconom.2005.07.009

�Corresponding author. Present address: The John E. Walker, Department of Economics, 222 Sirrine

Hall, Clemson University, Clemson, SC 29634-1309, USA.

E-mail addresses: [email protected] (L. Simar), [email protected] (P.W. Wilson).

1. Introduction

Linear-programming-based measures of efficiency along the lines of Charnes et al.(1978, 1979) and Fare et al. (1985) are widely used in the analysis of efficiency ofproduction. These methods are based on definitions of technical and allocativeefficiency in production provided by Farrell (1957). Among this literature, thoseapproaches which incorporate convexity assumptions are known as data envelop-ment analysis (DEA).

DEA measures the efficiency relative to a non-parametric, maximum likelihoodestimate of an unobserved true frontier, conditional on observed data resulting froman underlying data-generating process (DGP). These methods have been widelyapplied to examine technical and allocative efficiency in a variety of industries; seeGattoufi et al. (2004) for a comprehensive bibliography. Many of these studies haveused a two-stage approach, where efficiency is estimated in the first stage, and thenthe estimated efficiencies (or, in a few cases, ratios of estimated efficiencies,Malmquist indices, etc.) are regressed on covariates (typically different from thoseused in the first stage) that are viewed as representing environmental variables. Thisapproach is advocated by Chilingerian and Sherman (2004), Ray (2004), andRuggiero (2004); published examples include Byrnes et al. (1988), Ray (1988, 1991),Nyman and Bricker (1989), Aly et al. (1990), McCarty and Yaisawarng (1993),Rhodes and Southwick (1993), Banker and Johnston (1994), Chirkos and Sears(1994), Dusanksy and Wilson (1994), Kooreman (1994), Lovell et al. (1994), Sextonet al. (1994), Chilingerian (1995), Rosko et al. (1995), Arnold et al. (1996), De Borgerand Kerstens (1996), Gonzalez and Barber (1996), Luoma et al. (1996), Carrington etal. (1997), Gillen and Lall (1997), Burgess and Wilson (1998), Kirjavainen andLoikkanen (1998), McMillan and Datta (1998), Puig-Junoy (1998), Dietsch andWeill (1999), Fried et al. (1999a), Garden and Ralston (1999),Cheng et al. (2000),Resende (2000), Worthington and Dollery (2000), Chakraborty et al. (2001),Mukherjee et al. (2001), Raczka (2001), Ralston et al. (2001), Isik and Hassan (2002),O’Donnell and van der Westhuizen (2002), Otsuki et al. (2002), Stanton (2002),Binam et al. (2003), Chu et al. (2003), Wang et al. (2003), Barros (2004), Okeahalam(2004), and Turner et al. (2004). In a slight variation on this approach, Fried et al.(1993, 1999b, 2002) regressed radial and non-radial slacks on environmentalvariables. Since their dependent variables are functions of estimated efficiencies, theproblems discussed below apply here as well. In addition, Internet search enginessuch as google.com reveal hundreds of unpublished working papers that use the two-stage approach.1

As far as we have been able to determine, none of the studies that employ this two-stage approach have described the underlying DGP. Since the DGP has not beendescribed, there is some doubt about what is being estimated in the two-stageapproaches. Among the studies that regress DEA estimates of efficiency on some

ARTICLE IN PRESS

1Internet searches on October 12, 2004 on google.com returned about 801 hits for the phrases ‘‘data

envelopment analysis’’ and ‘‘two-stage’’; about 531 hits were obtained using ‘‘data envelopment analysis’’

and ‘‘tobit’’. The vast majority of these appear to be working papers.

L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–6432

covariates in the second stage, most have specified a censored (tobit) model for thesecond stage, but several (e.g., Aly et al., 1990; Chirkos and Sears, 1994; Dietsch andWeill, 1999; Ray, 1991; Sexton et al., 1994; Stanton, 2002) have estimated a linearmodel by ordinary least squares (OLS). Authors have argued that DEA efficiencyestimates are somehow censored since there are typically numerous estimates equalto one, but no coherent account of how the censoring arises has been offered. Othershave used OLS in the second-stage regression after transforming the DEA estimatesof efficiency using log, logistic, or log-normal transformations, and in some casesadding or subtracting an arbitrary constant to avoid division by zero or taking thelog of zero (e.g., Byrnes et al., 1988; Nyman and Bricker, 1989; Ray, 1991; Puig-Junoy, 1998). Lovell et al. (1994) and Burgess and Wilson (1998) avoided boundaryproblems in their second-stage regressions by using in their first-stage estimation aleave-one-out estimator of efficiency originally suggested by Andersen and Petersen(1993). Unfortunately, however, it is difficult to give a statistical interpretation tothis estimator, even if the second-stage regressions are ignored. Still others haveregressed ratios of efficiency estimates, Malmquist indices, or differences in efficiencyestimates in the second stage; these have avoided boundary problems, but have stillnot provided a coherent description of a DGP that would make such regressionssensible.

A more serious problem in all of the two-stage studies that we have found arisesfrom the fact that DEA efficiency estimates are serially correlated.2 Consequently,standard approaches to inference—used in all but two of the studies we have seenthat employ the two-stage approach—are invalid. The two exceptions are Xue andHarker (1999) and Hirschberg and Lloyd (2002); they recognize that DEA efficiencyestimates are serially correlated, but both papers use a naive bootstrap method basedon resampling from an empirical distribution in their attempts to correct the serialcorrelation problem. Unfortunately, the naive bootstrap used by both Xue andHarker (1999) and Hirschberg and Lloyd (2002) is inconsistent in the context of non-parametric efficiency estimation, as demonstrated by Simar and Wilson (1999a, b),and so the approaches by Xue and Harker (1999), and Hirschberg and Lloyd (2002)make little sense. Moreover, neither of these studies describe a DGP for which theirsecond-stage regressions would be appropriate, and so again it is unclear what isbeing estimated in these studies.

This paper describes a DGP that is logically consistent with regression of non-parametric, DEA efficiency estimates on some covariates in a second stage. Inaddition, we demonstrate that while conventional inference methods are inconsistentin the second-stage regression, consistent inference is both possible and feasible.

ARTICLE IN PRESS

2The correlation arises in finite samples from the fact that perturbations of observations lying on the

estimate frontier will in many, and perhaps all, cases cause changes in efficiencies estimated for other

observations. A similar problem arises in OLS regression, where estimated residuals are serially correlated

in finite samples even when the underlying true residuals are not (see Maddala (1988), for discussion).

However, in the regression case, the correlation disappears more quickly than in the DEA context, where

convergence rates are much slower in higher dimensions.

L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 33

2. A statistical model

Let x 2 Rpþ denote a ð1� pÞ vector of inputs, y 2 R

qþ denote a ð1� qÞ vector of

outputs, and z 2 Rr denote a ð1� rÞ vector of environmental variables. Elements of zmay be either continuous or discrete; we will elaborate further on this later in thepaper. How one might decide whether a particular variable is an environmentalvariable rather than an input or output of the production process is not entirelyclear; our aim is not to provide answers to such questions, but rather to rationalizethe two-stage analysis that is performed once this decision has been made.3

Studies that have used the two-stage approach have assumed, either explicitly orimplicitly, that firms face certain environmental variables z, and that these constraintheir choices of inputs x and outputs y. In the real world, the analyst is confrontedwith a set of observations Sn ¼ fðxi; yi; ziÞg

ni¼1.

Assumption A1. The sample observations ðxi; yi; ziÞ in Sn are realizations of

identically, independently distributed random variables with probability density

function f ðx; y; zÞ which has support over P� Rr, where P � Rpþqþ is a production

set defined by

P ¼ fðx; yÞjx can produce yg. (1)

In any interesting case, z is not independent with respect to ðx; yÞ, i.e.,f ðx; yjzÞaf ðx; yÞ; independence between z and ðx; yÞ can be tested using the methodssurveyed by Wilson (2003). Otherwise, there would be no motivation for the second-stage regression. Assumption A1 means that the constraints on firms’ choices ofinputs x and outputs y due to the environmental variables that firms face operatethrough the dependence of ðx; yÞ on z in f ðx; y; zÞ.4

The boundary of P is sometimes referred to as the technology or the production

frontier, and is given by the intersection of P and the closure of its compliment.Firms which are technically inefficient operate at points in the interior of P, whilethose that are technically efficient operate somewhere along the technology definedby the boundary of P.

ARTICLE IN PRESS

3Note that one can test whether a particular variable is an input, or an output, using the methods

described in Simar and Wilson (2001a).4Coelli et al. (1998, pp. 166–171) discuss several alternative formulations where the production set is

made to depend on the environmental variables z in various ways, in contrast to the formulation in (1) and

(2). In each of these alternative formulations, the definition of the production set would involve

conditioning on z, and an additional set of constraints involving z would be added to the linear program

(10) below that is used to estimate d0 in (2). While these approaches might be sensible for some situations,

they were not used in the studies cited earlier in Section 1; moreover, these alternative approaches leave no

role for a second-stage regression, and hence are not the focus of this paper. In our formulation, the

environmental variables z influence the mean and variance of the inefficiency process, but not the

boundary of its support; this is consistent with the formulations in the studies we have cited in Section 1,

where in each case the environmental variables appear only in the second-stage regressions. This is also

similar to the idea behind parametric, stochastic frontier models where the mean of the inefficiency term is

parameterized in terms of some covariates, as we discuss later in Section 5.


Various measures of technical efficiency are possible. Define a measure d for somepoint ðx0; y0Þ 2 R

pþqþ such that

d0 ¼ dðx0; y0jPÞ � supfdjðx0; dy0Þ 2 P; d40g. (2)

This is simply the Farrell (1957) measure of output technical efficiency, which is thereciprocal of the Shephard (1970) output distance function. For ðx0; y0Þ 2 P,dðx0; y0jPÞX1. Note that d provides a measure of Euclidean distance from the pointðx0; y0Þ 2 R

pþqþ to the boundary of P in a direction parallel to the output axes and

orthogonal to the input axes. In the next assumption, and frequently in thediscussion that follows, we will replace the subscripts ‘‘0’’ in (2) with subscript i tosignify that the distance measure is evaluated for a particular, specific observationin Sn.

For reasons that will become clear later, it is convenient to represent y in terms ofits polar coordinates, while expressing the modulus in terms of distance from theboundary ofP. Again for an arbitrary point ðx0; y0Þ 2 R

pþqþ , and y0 ¼ ½y01 . . . y0q�, we

can write the angles as

Z0j ¼arctanðy0;jþ1=y01Þ for y0140;

p=2 if y01 ¼ 0;

((3)

for j ¼ 1; . . . ; q� 1. The corresponding modulus is given by oðy0Þ ¼ffiffiffiffiffiffiffiffiffiy00y0

p, which is

related to the Farrell efficiency measure by

dðx0; y0jPÞ ¼oðdðx0; y0jPÞ y0Þ

oðy0Þ. (4)

Thus, since P is fixed, we can characterize y0 by ðg0; d0Þ, where g0 ¼ ½Z01 . . . Z0;q�1�and d0 ¼ dðx0; y0jPÞ.

The joint density f ðx; y; zÞ can now be described by a series of conditional densitiesand in terms of cylindrical coordinates:

f ðxi; gi; di; ziÞ ¼ f ðxi; gijdi; ziÞf ðdijziÞf ðziÞ. (5)

The order of the conditioning on the right-hand side of (5) reflects the sequentialnature of the DGP. Firm i is faced with environmental variables zi drawn from f ðzÞ.Given this zi, an efficiency level di is drawn from f ðdijziÞ, and then xi and gi aredrawn from f ðx; gjd; zÞ, resulting in a realization ðxi; yi; ziÞ from the joint densityf ðx; y; zÞ after transforming the polar coordinates ðgi; diÞ to Cartesian coordinates yi.

Assumption A2. The conditioning in f ðdijziÞ in (5) operates through the following

mechanism:

di ¼ cðzi; bÞ þ eiX1, (6)

where c is a smooth, continuous, function, b is a vector of (possibly infinitely many)parameters, and ei is a continuous iid random variable, independent of zi.

Note that Assumptions A1 and A2 amount to a separability condition; theproduction setP is assumed to be a subset of the entire sample space, while the effectof the covariates z operates through the dependence between y and z induced by (6).

ARTICLE IN PRESS


These assumptions provide a rationale for second-stage regressions. In an appliedsetting, one might reasonably wonder whether the implied separability condition issupported by the data; if it is not, one might prefer to use one of the alternativeformulations discussed by Coelli et al. (1998) (see footnote 4, where the covariates zare included in the first-stage estimation of efficiency. In the alternative formula-tions, the covariates z take the role of inputs or outputs either discretionary or non-discretionary). Fortunately, the testing methods described by Simar and Wilson(2001a) can be used to test the separability assumption here.

Assumption A3. ei in (6) is distributed Nð0;s2e Þ with left-truncation at 1� cðzi;bÞ for

each i.

Assumption A3 could be changed to impose some other distribution on the ei, butnormality seems a natural choice. Alternatively, one could leave the distribution of eunspecified and employ the semi-parametric method for truncated regressionproposed by Honore and Powell (1994); one could in addition leave the functioncð�Þ unspecified and employ the non-parametric method for truncated regressionproposed by Lewbel and Linton (2002). Here, however, we impose a distributionalform for e and later will assume a form for cð�Þ, since that is what has been done inevery two-stage applied study that we are aware of.

The production set P is sometimes described in terms of its sections

YðxÞ � fyjðx; yÞ 2 Pg (7)

and

XðyÞ � fxjðx; yÞ 2 Pg, (8)

which form the output feasibility and input requirement sets, respectively. Knowl-edge of either YðxÞ for all x or XðyÞ for all y is equivalent to knowledge of P; Pimplies (and is implied by) both YðxÞ and XðyÞ. Thus, both YðxÞ andXðyÞ inherit theproperties of P.

Various assumptions regarding P are possible; we adopt those of Shephard (1970)and Fare (1988):

Assumption A4. P is closed and convex; YðxÞ is closed, convex, and bounded for all

x 2 Rpþ; and XðyÞ is closed and convex for all y 2 R

qþ.

Assumption A5. ðx; yÞeP if x ¼ 0, yX0; ya0, i.e., all production requires use of some

inputs.

Assumption A6. For exXx; eypy, if ðx; yÞ 2 P then ðex; yÞ 2 P and ðx;eyÞ 2 P, i.e., both

inputs and outputs are strongly disposable.

Here and throughout, inequalities involving vectors are defined on an element-by-element basis; e.g., for ex; x 2 R

pþ, exXx means that some number ‘ 2 f0; 1; . . . ; pg of

the corresponding elements of ex and x ere equal, while ðp� ‘Þ of the elements of exare greater than the corresponding elements of x. Assumption A5 merely says thatthere are no free lunches. Assumption A6 is sometimes called free disposability and isequivalent to an assumption of monotonicity of the technology.

ARTICLE IN PRESS


In order for our estimators of P and dðx0; y0jPÞ to be consistent, some additionalassumptions are needed. In particular, the probability of observing firms in aneighborhood of the boundary of P must approach unity as the sample size increases:

Assumption A7. For all ðx; yÞ 2 P such that ðy�1x; yePÞ and ðx; yyÞePÞ for y41,f ðx; yjzÞ is strictly positive, and f ðx; yjzÞ is continuous in any direction toward the

interior of P for all z.

Also, an assumption about the smoothness of the frontier is needed:

Assumption A8. For all ðx; yÞ in the interior of P, dðx; yjPÞ is differentiable in both its

arguments.

Our characterization of the smoothness condition here is stronger than required;Kneip et al. (1998) require only Lipschitz continuity for the distance functions, whichis implied by the simpler, but stronger requirement presented here.

Assumptions A4–A6 are standard in microeconomic theory of the firm;assumptions A7 and A8 are based on those of Kneip et al. (1998), with someextensions to accommodate the environmental variables in z. These assumptions aresufficient to ensure statistical consistency of the DEA estimators in the first-stageproblem that appears below. Assumptions A1–A3 have been introduced specificallyto accommodate the environmental variables. Together, Assumptions A1–A8 definea semi-parametric DGP F which yields the data in Sn. The problem is to estimatefdig

ni¼1 and b, and then to make inferences about these unknown quantities. For the

case of the di, we have provided two approaches to inference, in Simar and Wilson(1998, 2000a). Our focus here is on estimation and inference about b, which describesthe marginal effects of z on inefficiency.

3. Typical two-stage approaches

The convex hull of the free disposal hull of the observed pairs ðxi; yiÞ contained inSn has frequently been used to estimate the production set P. This estimator isdescribed bybP ¼ fðx; yÞ j ypYq; xXXq; i0q ¼ 1; q 2 Rn

þg, (9)

where Y ¼ ½y1 . . . yn�, X ¼ ½x1 . . . xn�, i denotes an ðn� 1Þ vector of ones, and q is anðn� 1Þ vector of intensity variables. Korostelev et al. (1995) proved that bP is aconsistent estimator of P under conditions met by assumptions A1–A8 above.Estimators of the Farrell efficiency measure can be constructed by replacingP on theright-hand side of (4) with bP.

The estimator of d0 ¼ dðx0; y0jPÞ defined in (2) at a particular point ðx0; y0Þ 2 Rpþqþ

can be written in terms of the linear programbd0 ¼ dðx0; y0jbPÞ

¼ maxfy40 j yy0pYq; x0XXq; i0q ¼ 1; q 2 Rnþg, ð10Þ

ARTICLE IN PRESS


where the maximization provides a solution for y as well as q. This is merely theempirical analog of the measure defined in (2).

It is straightforward to prove that bd0 is a consistent estimator of d0 underassumptions A1–A8 by altering the notation in Kneip et al. (1998). In particular,bd0 ¼ d0 þ Opðn

�2=ðpþqþ1ÞÞ. (11)

The rate of convergence is low, as is typical in non-parametric estimation, and therate slows as pþ q is increased—this is the well-known curse of dimensionality.Moreover, by construction, bd0 is biased downward.

Few results exist on the sampling distributions of non-parametric efficiencyestimators such as the one in (10). Gijbels et al. (1999) derived the asymptoticdistribution of the Shephard (1970) output distance function in the special case ofone input and one output ðp ¼ q ¼ 1Þ, along with an analytic expression for its largesample bias and variance, and it is similarly straightforward to extend these results tothe input-oriented case by appropriate changes in notation. Unfortunately, in themore general multivariate setting where pþ q42, the radial nature of the distancefunctions and the complexity of the estimated frontier complicate the derivations. Sofar, the bootstrap appears to offer the only way to approximate asymptoticdistribution of the distance function estimators in multivariate settings. For thesecond-stage regression problem considered here, the bootstrap also appears to beuseful.

In principle, one could assume b in (2) has finite dimensions, fully specify thedensity f ðx; y; zÞ, and then estimate b by maximum likelihood. More often, however,researchers have employed some variant of the approach outlined below. The two-stage studies that have appeared in the literature typically specify cðzi;bÞ ¼ zib sothat (6) can be written as

di ¼ zibþ eiX1, (12)

where di ¼ dðxi; yijbPÞ in the notation of (10). These studies then (i) use the observed

pairs ðxi; yiÞ in Sn to estimate di for all i ¼ 1; . . . ; n, yielding a set of estimates fbdigni¼1;

(ii) replace the unobserved di on the left-hand side of (12) with estimates bdi obtainedfrom step (i); and then (iii) estimatebdi ¼ zibþ xiX1 (13)

using censored (tobit) regression or, in a few cases, OLS.5

Regardless of the form chosen for c, the two-stage approach outlined abovepresents problems for inference. First, the dependent variable in (6) and (12) isunobserved, and must be replaced by an estimate in the actual regression that is

ARTICLE IN PRESS

5Some (e.g., Puig-Junoy et al., 1998) have transformed the dependent variable, while introducing an

arbitrary constant to avoid taking the log of zero. Those who have used a tobit specification in the second

stage have justified their approach by the fact that typically several, perhaps many, efficiency estimates

equal unity in a given application. As far as we are aware, all who have specified a censored regression

model in the second stage have constrained the processes that determine the probability of censoring and

that govern the uncensored observations to be the same. See the Appendix for a discussion of censored and

truncated regression models, their differences, and a discussion of the last point.


estimated. The bdi’s that are used in the second-step estimation of (13) are seriallycorrelated, and in a complicated, unknown way. To understand this, note that di ¼bdðxi; yij

bPÞ depends on all the observations ðxi; yiÞ in Sn through bP, and conse-quently so must the error term xi in (13). Moreover, while the observations in Sn

are assumed independently drawn in assumption A1, xi and yi are correlatedwith zi due to assumption A2—otherwise, there would be no motivation for thesecond-stage regression. This in turn means the error term xi in (13) is correlatedwith zi.

Both the correlation among the xi’s as well as the correlation between xi and zi

disappear asymptotically, but only at the same slow rate given in (11) with which bdi

converges. This means that maximum likelihood estimates of b in the second-stageregression will be consistent, but will not have the usual, parametric convergence rateof n�1=2. More troubling, however, for pþ q43, the correlation among the xi’s doesnot disappear quickly enough for standard approaches to inference (based on theinverse of the negative Hessian of the log-likelihood) to be valid.

Further consideration reveals an additional problem. Note that we can alwayswrite bdi ¼ EðbdiÞ þ ui, (14)

where EðuiÞ ¼ 0. In addition, the bias of the estimator bdi is defined by

BIASðbdiÞ � EðbdiÞ � di. (15)

Substituting for EðbdiÞ from (14) in (15) and re-arranging terms yields

di ¼bdi � BIASðbdiÞ � ui. (16)

Substituting for di in (12) givesbdi � BIASðbdiÞ � ui ¼ zibþ eiX1. (17)

Since bdi is a consistent estimator, the ui become negligible asymptotically, as doesBIASðbdiÞ. These facts provide justification for writing (13), the equation that istypically estimated in two-stage applications.

Although the ui in (14) and (16), (17) have zero mean, the term BIASðbdiÞ does not.Rather, the bias of bdi is always strictly negative in finite samples. The ui are unknownand cannot be estimated, but the bias term can be estimated by bootstrap methods;see Efron and Tibshirani (1993) for discussion, and Simar and Wilson (2000a) for anexample similar to the present context. The bootstrap bias estimate equals the truebias plus a residual:dBIASðbdiÞ ¼ BIASðbdiÞ þ vi. (18)

The variance of the residual vi diminishes as n!1, and hence vi is typically ofsmaller magnitude than BIASðbdiÞ for reasonable sample sizes n. The bootstrapestimator of bias can in turn be used to construct a bias-corrected estimator of d:bbdi ¼

bdi �dBIASðbdiÞ. (19)

ARTICLE IN PRESS


Substituting for dBIASðbdiÞ in (19) from (18), re-arranging terms, and then substitutingfor BIASðbdiÞ in (17) yields

bbdi þ vi � ui ¼ zibþ eiX1. (20)

As noted, both the terms vi and ui become negligible asymptotically; hence maximumlikelihood estimation on

bbdi � zibþ eiX1 (21)

will yield consistent estimates.Comparing (13) and (17), it is clear that estimation of (13) ignores both BIASðbdiÞ

and ui. Estimation of (21) ignores vi and ui. This aspect of the estimation problemresembles the story of measurement error in the dependent variable that is told inevery undergraduate econometrics textbook. If vi is indeed of smaller magnitudethan BIASðbdiÞ, we would expect estimates of b from (21) to be more statisticallyefficient than those from (13). We examine this question in our Monte Carloexperiments that follow.

In every two-stage application that we have found, the bias term in (17) has beenignored. In the special case where p ¼ q ¼ 1, Gijbels et al. (1999) demonstratethat BIASðbdiÞ is affected by the curvature of the boundary of P. Although resultsdo not exist for more general cases, one can speculate that a similar phenomenonexists in higher-dimensional spaces. In addition, it seems clear that BIASðbdiÞ willbe larger in regions of P where data are sparse relative to other regions of P wherethe data are more dense. In regions where the data are sparse, there is lessinformation about where the boundary of P lies, and hence the bias in estimatedefficiency is likely to be larger.6 All of this suggests that BIASðbdiÞ, which isincorporated in the error term of (13) when it is ignored, is correlated with xi and yi,and hence with zi. While this correlation, as well as the bias itself, disappearsasymptotically, it is reasonable to suppose that including an estimate of bias inthe second-stage regression might improve the efficiency of estimation of b infinite samples, and perhaps also improve the coverage of estimated confidenceintervals for b.

As a final remark, we note that almost all researchers have estimated (13) byassuming a censored normal (tobit) specification for xi. The tobit specification issometimes motivated by the observation that several values in fbdig

ni¼1 are equal to

unity, suggesting a probability mass at 1. However, it is important to recall that theunderlying true model in (12) (or (6)) does not have this property. The process thatdetermines whether bdi ¼ 1 is primarily an artifact of finite samples, and has nothingto do with the process described by (6) in Assumption A2.

ARTICLE IN PRESS

6It is common for data to be rather unevenly distributed over P in applications; examples include data

for banks (Wheelock and Wilson, 2000, 2001) and hospitals (Wilson and Carey, 2004).


4. Toward better estimation and inference

The problems associated with estimating (13) and making inference about b arisefrom the serial correlation and bias of bdi, and the correlation between xi and zi. Thestructures of these phenomena are unknown, and difficult to guess. In addition, (13)differs from the true model in (12), where the dependent variable is unobserved. Wepropose two bootstrap procedures to overcome these difficulties; we describe theprocedure in this section, and present Monte Carlo evidence in the next section.

In order to implement a bootstrap procedure, we must draw iid bootstrap samples(i.e., pseudo-data) ðx�i ; y

�i ; z�i Þ from a density bf ðx; y; zÞ. By now it is well-known that

the ordinary, naive bootstrap based on resampling from the empirical distribution ofthe data is inconsistent in the present context due to the bounded nature of the DGP;see Simar and Wilson (1999a, b, 2000b) and Kneip et al. (2003) for discussion. Kneipet al. (2003) derive the asymptotic distribution of the DEA efficiency estimator, andprove the consistency of two different bootstrap procedures for making inferencesabout efficiencies of individual firms; one of these procedures relies on sub-sampling,while the other requires smoothing both the density of inputs and outputs as well asthe DEA estimate of the frontier. Simar and Wilson (1998, 2000a) describedsmoothed bootstrap procedures that approximate the second approach of Kneipet al. (2003), but neither Simar and Wilson (1998, 2000a) nor Kneip et al. (2003)considered environmental variables.

Note that Assumption A2 provides an extra piece of information that was notavailable in the models considered by Simar and Wilson (1998, 2000a) and Kneipet al. (2003). The bootstrap procedures proposed by Simar and Wilson (2000a) andKneip et al. (2003) allow for heterogeneity in the distribution of d, but incorporateno assumptions on the form of the heterogeneity; here, however, the form is madeexplicit by Assumption A2. Fortunately, the information provided by AssumptionA2 allows considerable simplification in the bootstrap procedures we propose below.

We propose two bootstrap procedures for the two-stage efficiency estimationproblem. The first procedure is designed to improve on inference, but without takingaccount of the bias term in (17):

Algorithm #1.

[1] Using the original data in Sn, compute bdi ¼bdðxi; yij

bPÞ8i ¼ 1; . . . ; n using (10).[2] Use the method of maximum likelihood to obtain an estimate bb of b as well as an

estimate bse of se in the truncated regression of bdi on zi in (13) using the mon

observations where bdi41.[3] Loop over the next three steps ([3.1]–[3.3]) L times to obtain a set of bootstrap

estimates A ¼ fðbb�;bs�e ÞbgLb¼1:[3.1] For each i ¼ 1; . . . ;m, draw ei from the Nð0;bs2e Þdistribution with left-

truncation at ð1� zibbÞ.7

ARTICLE IN PRESS

7See the Appendix for details on how to draw from a left-truncated normal distribution.


[3.2] Again for each i ¼ 1; . . . ;m, compute d�i ¼ zibbþ ei.

[3.3] Use the maximum likelihood method to estimate the truncated regression of

d�i on zi, yielding estimates ðbb�;bs�e Þ.[4] Use the bootstrap values in A and the original estimates bb;bse to construct

estimated confidence intervals for each element of b and for se as describedbelow.

As discussed earlier, results from Kneip et al. (1998) establish consistency of theestimation in step [1] under our assumptions. Given the discussion in Section 3 andthe fact that bdi is a consistent estimator of di, it follows that maximum likelihoodestimation in step [2] will yield consistent estimates of b, though without thecustomary

ffiffiffinp

-convergence rate. Step [3] is simply a parametric bootstrap of a(nonlinear) regression model; properties of the bootstrap in the context of regressionmodels have been examined by Bickel and Freedman (1981), Wu (1986), and others.

Alternatively,bbdi can be regressed on zi in (21), and the following bootstrap

procedure can be used to provide inference about b.

Algorithm #2.

[1] Using the original data in Sn, compute bdi ¼bdðxi; yij

bPÞ 8 i ¼ 1; . . . ; n using (10).[2] Use the method of maximum likelihood to obtain an estimate bb of b as well as

an estimate bse of se in the truncated regression of bdi on zi in (13) using the mon

observations when bdi41.[3] Loop over the next four steps ([3.1]–[3.4]) L1 times to obtain n sets of bootstrap

estimates Bi ¼ fbd�ibgL1

b¼1:

[3.1] For each i ¼ 1; . . . ; n, draw ei from the Nð0;bs2e Þdistribution with left-

truncation at ð1� zibbÞ.

[3.2] Again for each i ¼ 1; . . . ; n, compute d�i ¼ zibbþ ei.

[3.3] Set x�i ¼ xi; y�i ¼ yibdi=d

�i for all i ¼ 1; . . . ; n.

[3.4] Compute bd�i ¼ dðxi; yijbP�Þ8i ¼ 1; . . . ; n, where bP� is obtained by replacing

Y ; X in (9) with Y� ¼ ½y�1 . . . y�n�, X� ¼ ½x�1 . . . x

�n�.

[4] For each i ¼ 1; . . . ; n, compute the bias-corrected estimatorbbdi defined by (19)

using the bootstrap estimates in Bi obtained in step [3.4] and the original

estimatebdi.[5] Use the method of maximum likelihood to estimate the truncated regression ofbbdi on zi, yielding estimates ð

bbb;bbsÞ.[6] Loop over the next three steps ([6.1]–[6.3]) L2 times to obtain a set of bootstrap

estimates C ¼ fðbb�;bs�e ÞbgL2

b¼1:

[6.1] For each i ¼ 1; . . . ; n, draw ei from the Nð0;bbsÞ distribution with left-

truncation at ð1� zibbbÞ.

ARTICLE IN PRESS


[6.2] Again for each i ¼ 1; . . . ; n, computed��i ¼ zibbbþ ei.

[6.3] Use the maximum likelihood method to estimate the truncated regression

of d��i on zi, yielding estimates ðbbb�;bbs�Þ.

[7] Use the bootstrap values in C and the original estimatesbbb; bbs to construct

estimated confidence intervals for each element of b and for se as describedbelow.

Note that steps [1] and [2] in Algorithm #2 are the same as in Algorithm #1. Steps[3] and [4] in Algorithm #2 employ a parametric bootstrap in the first-stage problem

in order to produce bias-corrected estimatesbbdi. The parametric structure provided by

Assumption A2, when we assume in addition that cðzi;bÞ ¼ zib, greatly simplifiesthe smoothing that was employed in Simar and Wilson (2000a) and Kneip et al.

(2003); otherwise, the bootstrap used to obtainbbdi is similar to the one described in

Simar and Wilson (2000a), and approximates the double-smooth procedure used inKneip et al. (2003). Steps [5] and [6] are essentially the same as in Algorithm #1,

except that the bias-corrected estimatesbbdi replace di in (12) instead of bdi as in

Algorithm #1.In either case, once the set of bootstrap values A or C has been obtained either in

step 3 of Algorithm #1 or step 6 of Algorithm #2, percentile bootstrap confidenceintervals can be constructed. To illustrate, suppose that interest lies in bj, the jth

element of b, which has been estimated bybbbj, the jth element of

bbb. If the distributionof ðbbbj � bjÞ were known, it would be trivial to find values aa; ba such that

Pr½�bapðbbbj � bjÞp� aa� ¼ 1� a (22)

for some small value of a, 0oao1, say a ¼ :05. Since the distribution of ðbbbj � bjÞ is

unknown, we can use the jth element of each bootstrap valuebbb� to find values a�a; b�a

such that

Pr½�bapðbbb�j � bbbjÞp� a�a� � 1� a, (23)

with improving approximation as L2!1. Substituting a�a; b�a for aa; ba in (22)

leads to an estimated confidence interval ½bbbj þ a�a;

bbbj þ b�a�.

Practical implementation of either Algorithm #1 or Algorithm #2 is possible, andeasy, with existing software. There are now a number of packages available that canbe used to compute the DEA efficiency estimator, and packages such as LIMDEP,STATA, and others can be used to estimate truncated regression models.Interpreters such as MATLAB, R, S-Plus, etc. can be used to organize the resultsfrom one package so that they may be fed to another package.

ARTICLE IN PRESS


The only remaining issue concerns the choice of the number of replications, L, inAlgorithm #1, or L1 and L2 in Algorithm #2. The choice of L1 in Algorithm #2determines the number of bootstrap replications used to compute the bias-corrected

estimatesbbdi. We and others have found that 100 replications are typically sufficient

for this purpose, since constructingbbdi requires only computation of a mean and then

a difference. The choices of L and L2, however, determine the number of bootstrapreplications used to construct estimates of confidence intervals in the two algorithms.Confidence-interval estimation is tantamount to estimating the tails of distributions,which necessarily requires more information. Hall (1986) suggests 1000 replicationsfor estimating confidence intervals. We use 2000 replications (the values of L and L2)in our simulations and empirical examples that follow. More accurate estimates canbe achieved with larger numbers of replications, and, in the case of confidence-interval estimation, diminishing returns arise slowly. One must balance this concern,however, with the waiting time incurred when the number of replications is increased.

In Simar and Wilson (2001b, 2004), the bootstrap principle was iterated to assessthe accuracy of bootstrap confidence interval estimates, along the lines described byHall (1992). Although a similar procedure could be employed to assess the accuracyof the confidence intervals estimated with Algorithms #1 and #2, the doublebootstrap in Algorithm #2 uses sequential, rather than iterated, bootstraps.Moreover, the first loop in step [3] of Algorithm #2 is used only to construct bias-corrected distance function estimates, and so it is reasonable to use fewer replicationsthan when estimating confidence intervals in the second loop. Consequently, thecomputational burden incurred by Algorithm #2 is far less than what would typicallybe incurred with an iterated bootstrap as in Simar and Wilson (2001b, 2004).

5. Link with fully parametric models

Before turning to Monte Carlo evidence on the performance of the proceduresproposed in this section, note that some studies (e.g., Pitt and Lee, 1981; Kaligajanand Shand, 1985) have estimated fully parametric models with compositeerrors along the lines of Aigner et al. (1977) and Meeusen and van den Broeck(1977), with a second-stage regression of estimated inefficiency on some environ-mental variables. Kumbhakar et al. (1991), Reifshneider and Stevenson (1991),Huang and Liu (1994), Battese and Coelli (1995), and others have instead estimated(in a single stage) fully parametric models where the environmental variables areused to parameterize the mean of the one-sided inefficiency component of thecomposite error process.8

ARTICLE IN PRESS

8Regressing efficiency estimates obtained from maximum likelihood estimation of a parametric model

along the lines of Aigner et al. (1977) and Meeusen and van den Broeck (1977) is almost certain to result in

problems for statistical consistency. The covariates in the second-stage regression are correlated with the

one-sided error terms from the first stage in any interesting case; otherwise, there would be no need for the

second-stage regression. In any application, the covariates in the second stage are likely to be correlated,

perhaps highly so, with the covariates in the first stage, and hence the errors in the first stage cannot be

independent of the covariates in the first stage. Consequently, the likelihood that is maximized is not the


With a small modification of the assumptions in Section 2, these fully parametricmodels can be seen to be related to the semi-parametric approach in this paper. First,replace assumptions A2 and A3 with the following:

Assumption A2a. The conditioning in f ðdijziÞ in (5) operates through the following

mechanism:

di ¼ expðzibþ eiÞ, (24)

where b is a vector of (finitely many) parameters, and ei is a continuous iid random

variable, independent of zi.

Assumption A3a. ei in (24) is distributed Nð0;s2e Þ with left truncation at �zib for each i.

Then, for the case of one output (q ¼ 1), we can write

log yi ¼ log gðxijaÞ � xi, (25)

with xi distributed Nðzib;s2e Þ with left truncation at 0, and gð�j�Þ is a parametricfunction known up to a finite-length parameter vector a. This is a special case of

log yi ¼ log gðxijaÞ þ vi � xi, (26)

with viNð0;s2vÞ, which is the model estimated by Kumbhakar et al. (1991),Reifshneider and Stevenson (1991), Huang and Liu (1994), Battese and Coelli (1995),and others. In (25), s2v ¼ 0. The model in (26) can be extended to accommodatemultiple outputs; see Adams et al. (1999), Coelli (2000), Coelli and Perelman (2001),Atkinson and Primont (2002), and Sickles et al. (2002) for examples and discussion.

Note that the parametric structure afforded by gð�j�Þ is necessary for estimation toproceed in a single stage. Typically, gð�j�Þ is assumed to be a translog function, butthis is likely a mis-specification in many cases, particularly when firms are of widelyvarying size (see Wheelock and Wilson (2000), and Wilson and Carey (2004) fordiscussion and empirical examples, and Guilkey et al. (1983) and Chalfant andGallant (1985) for Monte Carlo evidence).

6. Monte Carlo experiments

To examine the performance of the various approaches to inference in the second-stage regression, we conducted several Monte Carlo experiments. In each case, wegenerated data from a known process and applied our bootstrap algorithms on eachof M Monte Carlo trials.

Data for the ith observation in each Monte Carlo trial were generated settingzi1 ¼ 1 and drawing zijNðmz;s

2zÞ for j ¼ 2; . . . ; r. Then ei is drawn from a Nð0;s2e Þ

left-truncated 1� zib. We then set di ¼ zibþ ei. Next, for each j ¼ 1; . . . ; p, we draw

ARTICLE IN PRESS

(footnote continued)

correct one, unless one takes account of the correlation structure. In our semi-parametric model given by

Assumptions A1–A8, this problem is avoided since the first-stage estimation does not require

independence between the inefficiencies and the inputs and outputs.


xijuniformð6; 16Þ. If q ¼ 1, we then set yi ¼ d�1i

Ppj¼1 x

3=4ij . Otherwise, we set

z ¼ d�1i

Ppj¼1x

3=4ij , and draw a1uniformð0; 1Þ. If q42, then for each ‘ ¼ 2; . . . ; q� 1

we also draw a‘uniformð0; 1�Pq�2

‘¼1a‘Þ. Finally, for j ¼ 1; . . . ; q� 1, we set

yij ¼ d�1i ajz, and then set yiq ¼ d�1i ð1�Pq�1

‘¼1a‘Þz. For the case q41, aggregate

output z is computedand then disaggregated among the q individual outputs.Although not required by Assumptions A1–A8, this, this,this, this results inindependence between the mix of outputs, characterized by the angles g, and ðx; d; zÞ.

In our first set of experiments, we set r ¼ 2, mz ¼ 2, sz ¼ 2, se ¼ 1. In addition,values of each element of b must be set. In order to simplify scaling in estimation ofthe second-stage truncated regression, we set b1 ¼ b2 ¼ 0:5.

In each experiment, we ran 1000 Monte Carlo trials. For Algorithm #1, we useL ¼ 2000 bootstrap replications. For Algorithm #2, we use L1 ¼ 100 replications forthe first loop used to compute the bias-corrected efficiency estimates, and L2 ¼ 2000replications for the second loop where the truncated regression model is boot-strapped. Also in each experiment, we compute the proportion among the 1000Monte Carlo experiments where the estimated confidence interval covers the truevalue of b1, b2, and se at nominal significance levels of 0.80, 0.90, 0.95, and 0.99.

Table 1 reports results for three sets of cases: truncated regression of bdi on zi, withinference by Algorithm #1 or by conventional methods (i.e., relying on asymptoticnormality), and censored (tobit) regression of bdi on zi with conventional inference. Inthe case of inference based on Algorithm #1 (columns 4–7 in Table 1), for p ¼ q ¼ 1and n ¼ 100, coverages for the slope parameter (b2) are too small, but not by a largeamount. Coverages improve when the sample size is increased to 400, as expected.Table 1 also reveals that as p and q are increased, the coverages obtained withAlgorithm #1 become slightly worse for a given sample size, but not too much so.Some worsening is to be expected due to the curse of dimensionality in the first-stageestimation; as ðpþ qÞ increases, the dependent variable in the second-stageregressions becomes noisier, making it more difficult to obtain precise informationabout the parameters of the regression.

Results for conventional inference are shown in columns 8–11 of Table 1.Coverages are roughly similar to those obtained with Algorithm #1. The last fourcolumns of Table 1 give results for tobit regression with conventional inference. Thisapproach involves a specification error in the sense that the model estimated is mis-specified. Not surprisingly, the results in Table 1 indicate that this approach resultsin catastrophe.

Additional insight is gained by examining the distributions of the estimators overthe 1000 Monte Carlo trials in our experiments. Fig. 1 shows kernel density estimatesfor these distributions where n ¼ 400.9 Fig. 1 contains six plots of density estimates;the rows (from top to bottom) correspond to bb1, bb2, and bse; the first columncorresponds to the tobit estimates described in Table 1, while the second columncorresponds to the truncated regression estimates obtained with Algorithm #1 and

ARTICLE IN PRESS

9The kernel density estimates were obtained using an Epanechnikov kernel and with bandwidths chosen

by the two-stage plug-in procedure proposed by Sheather and Jones (1991).


ARTICLE IN PRESS

Table

1

Estim

atedcoverages

ofconfidence

intervalsfrom

regressionofb d ion

z i

Trunc.

regression

Trunc.

regression

Tobitregression

p¼

qn

Param.Algorithm

#1—

nominalsignificance

Conventionalinference—

nominalsignificance

Conventionalinference—

nominalsignificance

0.80

0.90

0.95

0.99

0.80

0.90

0.95

0.99

0.80

0.90

0.95

0.99

1100b 1

0.811

0.868

0.891

0.925

0.808

0.895

0.925

0.965

1.000

1.000

1.000

1.000

b 20.743

0.819

0.858

0.915

0.752

0.841

0.882

0.938

0.030

0.043

0.059

0.097

s e0.747

0.809

0.848

0.905

0.736

0.814

0.867

0.933

0.000

0.000

0.000

0.000

1400b 1

0.811

0.912

0.948

0.976

0.805

0.911

0.955

0.988

1.000

1.000

1.000

1.000

b 20.784

0.878

0.923

0.967

0.782

0.884

0.929

0.974

0.000

0.000

0.000

0.000

s e0.776

0.875

0.909

0.956

0.774

0.874

0.916

0.966

0.000

0.000

0.000

0.000

2100b 1

0.886

0.928

0.948

0.970

0.875

0.953

0.973

0.990

1.000

1.000

1.000

1.000

b 20.661

0.727

0.779

0.846

0.676

0.763

0.815

0.893

0.043

0.060

0.087

0.147

s e0.664

0.742

0.789

0.850

0.661

0.754

0.811

0.893

0.000

0.000

0.000

0.000

2400b 1

0.772

0.931

0.979

0.992

0.746

0.885

0.958

0.996

1.000

1.000

1.000

1.000

b 20.733

0.822

0.884

0.939

0.732

0.835

0.897

0.960

0.001

0.001

0.001

0.001

s e0.735

0.812

0.873

0.926

0.722

0.805

0.883

0.936

0.000

0.000

0.000

0.000

3100b 1

0.900

0.943

0.958

0.970

0.916

0.960

0.978

0.991

1.000

1.000

1.000

1.000

b 20.624

0.699

0.748

0.807

0.662

0.745

0.793

0.868

0.078

0.112

0.147

0.219

s e0.655

0.716

0.758

0.816

0.663

0.741

0.795

0.863

0.000

0.000

0.000

0.000

3400b 1

0.627

0.893

0.990

0.999

0.563

0.755

0.896

0.997

1.000

1.000

1.000

1.000

b 20.649

0.761

0.821

0.904

0.647

0.777

0.841

0.936

0.002

0.005

0.005

0.008

s e0.712

0.792

0.846

0.919

0.698

0.791

0.857

0.936

0.000

0.000

0.000

0.000


also described in Table 1. In each of the six panels in Fig. 1, the dotted curve showsthe density estimate corresponding to p ¼ q ¼ 1, while the solid curve shows thedensity estimate corresponding to p ¼ q ¼ 2. True values of the parameters areindicated by the vertical dashed lines. Fig. 1 makes clear why the coverages of

ARTICLE IN PRESS

0.5 0.0 0.5 1.0 1.5

8

6

4

2

0

Tobit regr. of d-hat on z: beta-1-hat

N = 1000 Bandwidth = 0.02694

Den

sity

0.5 0.0 0.5 1.0 1.5

Truncated regr. of d-hat on z: beta-1-hat

N = 1000 Bandwidth = 0.1402

Den

sity

0.1 0.2 0.3 0.4 0.5 0.6 0.7

10

12

10

8

6

4

2

0

15

Tobit regr. of d-hat on z: beta-2-hat

N = 1000 Bandwidth = 0.0117

Den

sity

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Truncated regr. of dha-t on z: beta-2-hat

N = 1000 Bandwidth = 0.03124

Den

sity

0.6 0.7 0.8 0.9 1.0 1.1 1.2

Tobit regr. of d-hat on z: sigma-hat

N = 1000 Bandwidth = 0.0174

Den

sity

0.6 0.7 0.8 0.9 1.0 1.1 1.2

Truncated regr. of d-hat on z: sigma-hat

N = 1000 Bandwidth = 0.04011

Den

sity

8

6

4

2

0

5

0

10

15

5

0

12

10

8

6

4

2

0

Fig. 1. Estimates of sampling densities, Algorithm #1 (n ¼ 400; dotted curve for p ¼ q ¼ 1, solid curve for

p ¼ q ¼ 2).


estimated confidence intervals from the tobit model are so poor—none of theestimated values lie near the true values. The situation is much better for thetruncated regression; although the distributions for bb1 and bb2 show small amounts ofskewness, they are rather well-centered over the true values. Comparing the dottedand the solid curves reveals the effects of increasing dimensionality in the first-stageproblem; when dimensionality is increased from 2 to 4, the estimates of the samplingdensities in the second column of Fig. 1 become more disperse, and shift slightlyaway from the true parameter values. As discussed earlier, these effects are to beexpected since, with increasing dimensionality in the first stage, the dependentvariable in the second-stage regression contains more noise. It is interesting to note,however, that the doubling of dimensionality from 2 to 4 appears to have only asmall effect on the densities in Fig. 1.

A similar set of Monte Carlo experiments were performed to obtain the results inTable 2. In the case of columns 4–7 of Table 2, we employed the double bootstrap inAlgorithm #2. Comparison of these results with those for Algorithm #1 shown incolumns 4–7 of Table 1 reveal improved coverages in a number of cases. Given thatAlgorithm #2 involves only a small increase in computational burden overAlgorithm #1, the improved performance of Algorithm #2 seems to justify its use.Note that in Table 2, as in Table 1 with the single bootstrap, coverages worsen asðpþ qÞ increases for a given sample size, due to the curse of dimensionality in thefirst-stage estimation of the dependent variable. The worsening here, however, ismodest and less severe than in the case of Table 1 due to the bias correctionemployed in Algorithm #2.

The last four columns of Table 2 give coverage results for conventional inference

applied after the regression ofbbdi on zi in step [5] of Algorithm #2. As with Algorithm

#1 in Table 1, the coverages obtained with conventional inference here are broadlysimilar to those obtained with Algorithm #2, but the coverages provided byAlgorithm #2 are much better than those provided by Algorithm #1 in Table 1.However, with p ¼ q ¼ 3 and n ¼ 400, coverage for the intercept term (b1) is worsewith the conventional approach.

Kernel estimates of the densities of the estimators from the second set of experimentsare shown in Fig. 2, again for the case where n ¼ 400, with the dotted curvescorresponding to p ¼ q ¼ 1 and the solid curves corresponding to p ¼ q ¼ 2. As in Fig.1, the rows in Fig. 2 (from top to bottom) correspond to estimates of the intercept,slope, and s terms in the second stage. The first column in Fig. 2 shows results based onestimates from Algorithm #1; the densities are the same as those in the second columnof Fig. 1, but have been reproduced with different scalings on the vertical axes tofacilitate comparison with the second column of Fig. 2. The second column of Fig. 2

corresponds to estimates from the regression ofbbdi on zi using Algorithm #2. As in

Fig. 1, true parameter values are indicated by vertical dashed lines.Two phenomena become evident in Fig. 2. First, as noted several times already,

increasing dimensionality in the first-stage problem reduces the precision of estimatesin the second stage. Again, however, the results in Fig. 2 indicate that whendimensionality in the first-stage is doubled from 2 to 4 in our experiments, the effect

ARTICLE IN PRESS


on the second-stage estimates is not great. Second, the densities in the first column ofFig. 2 have slightly greater skewness than corresponding densities in the second

column; on the whole, the densities for the regression ofbbdi are better-centered on the

true values than the densities for the regression of bdi.The second set of experiments conducted to produce the results in Table 2 also

allow comparison of the root-mean-square-error (RMSE) of each estimator in

regressions of bdi on zi versus regressions ofbbdi on zi. Table 3 shows the RMSE for

each estimator (computed over 1000 Monte Carlo trials) with varying modeldimensions and sample sizes. These results reveal two interesting phenomena.

First, for p ¼ q ¼ 1, 2, or 3 and a sample size of 100, RMSE for each estimator is

smaller when bdi is regressed on zi than whenbbdi is regressed on zi. However, when the

sample size is increased to 400, with p ¼ q ¼ 1 or 2, use ofbbdi yields lower RMSE for

the intercept and slope estimators than bdi. For p ¼ q ¼ 3, use ofbbdi yields lower

RMSE for the intercept and slope estimators than bdi when the sample size isincreased to 800. Hence, at smaller sample sizes, performing the bias correction in

ARTICLE IN PRESS

Table 2

Estimated coverages of confidence intervals from regression ofbbdi on zi

Trunc. regression Trunc. regression

p ¼ q n Param. Algorithm #2—nominal significance Conventional inference—nominal significance

0.80 0.90 0.95 0.99 0.80 0.90 0.95 0.99

1 100 b1 0.798 0.867 0.891 0.927 0.799 0.888 0.923 0.957

b2 0.781 0.876 0.911 0.946 0.790 0.882 0.919 0.963

se 0.805 0.884 0.906 0.956 0.816 0.879 0.922 0.964

1 400 b1 0.802 0.898 0.937 0.974 0.801 0.907 0.949 0.982

b2 0.804 0.900 0.944 0.980 0.804 0.899 0.950 0.989

se 0.818 0.898 0.946 0.980 0.818 0.898 0.947 0.986

2 100 b1 0.850 0.967 0.976 0.986 0.808 0.959 0.982 0.995

b2 0.803 0.902 0.933 0.963 0.792 0.907 0.945 0.976

se 0.812 0.923 0.946 0.969 0.817 0.919 0.954 0.983

2 400 b1 0.758 0.912 0.975 0.991 0.744 0.885 0.960 0.993

b2 0.795 0.907 0.960 0.986 0.794 0.904 0.953 0.990

se 0.781 0.912 0.964 0.993 0.789 0.904 0.957 0.995

3 100 b1 0.741 0.971 0.993 1.000 0.658 0.941 0.996 1.000

b2 0.793 0.907 0.934 0.968 0.821 0.923 0.960 0.980

se 0.820 0.925 0.951 0.972 0.859 0.946 0.964 0.984

3 400 b1 0.378 0.645 0.871 0.999 0.335 0.546 0.716 0.958

b2 0.739 0.878 0.961 0.995 0.730 0.846 0.944 0.993

se 0.708 0.881 0.961 0.999 0.712 0.863 0.944 0.996


Algorithm #2 worsens RMSE relative to the simpler, single bootstrap. However, fora given dimensionality, as sample size is increased, the bias correction eventuallybecomes advantageous in terms of RMSE. Recall that comparison of Tables 1 and 2revealed that, in terms of coverages of estimated confidence intervals, the double

ARTICLE IN PRESS

0.5 0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5


N = 1000 Bandwidth = 0.1402

Den

sity

0.5 0.0 0.5 1.0 1.5

Truncated regr. of dhat-hat on z: beta-1-hat

N = 1000 Bandwidth = 0.1533

Den

sity

0.1 0.2 0.3 0.4 0.5 0.6 0.7

0

1

2

3

4

0

1

2

3

4

5

5

6

7


N = 1000 Bandwidth = 0.03124

Den

sity

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Truncated regr. of dhat-hat on z: beta-2-hat

N = 1000 Bandwidth = 0.0335

Den

sity

0.6 0.8 1.0 1.2

6Truncated regr. of d-hat on z: sigma-hat

N = 1000 Bandwidth = 0.04011

Den

sity

0.6 0.8 1.0 1.2

Truncated regr. of dhat-hat on z: sigma-hat

N = 1000 Bandwidth = 0.04292

Den

sity

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

0.0

0.5

1.0

1.5

Fig. 2. Estimates of sampling distributions, Algorithm #2 (n ¼ 400; dotted curve for p ¼ q ¼ 1, solid

curve for p ¼ q ¼ 2).


bootstrap was found to be superior in almost every case. Although the biascorrection in the double bootstrap may increase estimation noise at small samplesizes, inference-making abilities are enhanced at all sample sizes.

Second, the results in Table 3 also reveal that for a given sample size (e.g., n ¼ 100or n ¼ 400), RMSE increases as ðpþ qÞ increases. As noted earlier in the discussionsof Tables 1 and 2, this is due to the curse of dimensionality in the first-stageestimation of the dependent variable used in the second-stage regressions. At a givensample size, with increased dimensionality, the dependent variable in the second-stage regression contains less information about the underlying, latent variablein (6).

To further examine the link between two-stage non-parametric models in theliterature and parametric models as discussed in Section 5, we conducted anadditional set of simulations using (26) as the true model, with gðxijaÞ ¼

Qpj¼1x

aj

ij ,aj ¼ 0:8p�1, se ¼ 0:9, sv ¼ 0, r ¼ 2, sz ¼ 1, mz ¼ �1, and b1 ¼ b2 ¼ 0:5. Weconsidered four cases with p 2 f1; 3g and n 2 f100; 400g. For each case, we simulated

ARTICLE IN PRESS

Table 3

Root-mean square error of parameter estimators in truncated regression of bdi;bbdi on zi

p ¼ q n Parameter RMSE

bdibbdi

1 100 b1 0.4985 0.5261

b2 0.1107 0.1118

se 0.1481 0.1446

1 400 b1 0.2259 0.2255

b2 0.0504 0.0497

se 0.0687 0.0656

2 100 b1 0.7812 0.9515

b2 0.1418 0.1498

se 0.1925 0.1886

2 400 b1 0.3121 0.3002

b2 0.0585 0.0563

se 0.0811 0.0775

3 100 b1 1.1762 1.8447

b2 0.1773 0.2288

se 0.2292 0.2536

3 400 b1 0.5071 0.6203

b2 0.0743 0.0795

se 0.0960 0.1061

3 800 b1 0.3558 0.3542

b2 0.0527 0.0498

se 0.0646 0.0797


draws from the model in each of 1000 Monte Carlo trials; on each trial, we obtainedmaximum likelihood estimates of the parameters and performed conventionalinference with the inverse negative Hessian of the log-likelihood function using thecomputer code described in Coelli (1996). In addition, we applied our Algorithms #1and #2 described previously in Section 3.10

Results for estimated coverages obtained with the three estimation methods, atnominal significance levels of 0.90, 0.95, and 0.99, are shown in Table 4. In obtainingmaximum likelihood estimates, we employed two standard transformations, settings2 ¼ s2e þ s2v and g ¼ s2e=ðs

2e þ s2vÞ; results for estimates of s and g are reported for

the maximum likelihood estimates. By design, Algorithms #1 and #2 restrict sv ¼ 0,but this restriction is not imposed in the maximum likelihood estimation to remainsimilar with the way practitioners are likely to use these methods.

The results in Table 4 reveal that in every case, coverages for b1, b2, and sobtained with Algorithms #1 and #2 are better (in a few cases, only slightly so, but inmany cases substantially so) than those obtained with maximum likelihood. Since sv

is not restricted to zero in the maximum likelihood estimation, some of theinefficiency in the data-generating process is confused with statistical noise. Notethat the coverages for the production function parameters (the a’s) is poor; therefore,it appears that the method has difficulty in estimating these parameters accurately,and this in turn may cause distortions in the estimates of b1 and b2 that are obtainedwith the maximum likelihood method. Of course, one might expect that themaximum likelihood coverages would improve if sv were restricted to zero, but thisis not done in practical applications where models such as (28) are estimated.11

Table 4 also reveals that coverages obtained with Algorithm #1 are better thanthose obtained with Algorithm #2 in these experiments. This may be due to thedifferent scaling that is used here as opposed to the simulations that led to Tables1–3. The bias correction in Algorithm #2 likely adds some noise unless the bias beingcorrected is large. Recall that this was confirmed in Table 3 in the case of the originalexperiments, and so it should be no surprise that one can find a scaling as we havehere, where Algorithm #1 is able to dominate Algorithm #2 in terms of coverages ofestimated confidence intervals.

7. Empirical examples

As a final exercise, we provide an empirical example based on the paper by Alyet al. (1990). Aly et al. examined efficiency among a subsample of 322 commercialbanks operating in the U.S. during the fourth quarter of 1986. Here, we usethe definitions of Berger and Mester (2003) to define three inputs (purchased

ARTICLE IN PRESS

10Note that steps [3.1] and [3.2] in Algorithm #1 and steps [3.1], [3.2] and [6.1], [6.2] must be modified

slightly due to the exponential specification introduced by Assumptions A2a and A3a in Section 5.11One might also reasonably argue that if, in the true model, sv40, then the comparison across methods

in Table 4 might be different. But in this case, the first stages of the non-parametric approaches would no

longer be statistically consistent. Here, we have generated the data in such a way that each of the

estimation methods remains consistent.


funds, core deposits, and labor) and four outputs (consumer loans, business loans,real estate loans, and securities held) for banks; these are used to estimate technicalefficiency in the first stage. In the second stage, we attempt to explain the first-stageestimates in terms of several variables used by Aly et al. in their second-stageregression. Where Aly et al. employed OLS in their second stage, we use truncatedregression.

The covariates we use in the second-stage regression are similar to those used byAly et al., except that our SIZE variable is defined by the log of total assets ratherthan total deposits as in the original study, and we also include the square of SIZEand DIVERSE as well as an interaction term. In addition, we included bothindependent banks and banks that are members of multi-bank holding companies,

ARTICLE IN PRESS

Table 4

Coverage of estimated confidence intervals in parametric model

p n Parameter MLE—nom. signif. Alg. #1—nom. signif. Alg. #2—nom. signif.

0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99

1 100 a1 0.630 0.663 0.711 — — — — — —

a2 0.633 0.669 0.711 — — — — — —

b1 0.781 0.838 0.900 0.867 0.900 0.942 0.815 0.847 0.903

b2 0.808 0.864 0.930 0.873 0.915 0.961 0.858 0.899 0.955

s2 0.770 0.816 0.890 0.838 0.873 0.917 0.825 0.863 0.902

g 0.998 0.998 0.999 — — — — — —

1 400 a1 0.589 0.643 0.697 — — — — — —

a2 0.582 0.631 0.693 — — — — — —

b1 0.703 0.767 0.849 0.915 0.933 0.963 0.886 0.907 0.951

b2 0.724 0.803 0.886 0.902 0.932 0.972 0.895 0.928 0.970

s2 0.653 0.722 0.818 0.897 0.931 0.962 0.888 0.921 0.956

g 0.999 1.000 1.000 — — — — — —

3 100 a1 0.636 0.688 0.739 — — — — — —

a2 0.661 0.698 0.755 — — — — — —

a3 0.667 0.718 0.772 — — — — — —

a4 0.661 0.710 0.769 — — — — — —

b1 0.776 0.829 0.906 0.909 0.929 0.963 0.884 0.916 0.953

b2 0.783 0.834 0.916 0.883 0.919 0.965 0.839 0.881 0.936

s2 0.718 0.775 0.867 0.840 0.877 0.906 0.804 0.831 0.878

g 1.000 1.000 1.000 — — — — — —

3 400 a1 0.594 0.635 0.691 — — — — — —

a2 0.596 0.650 0.708 — — — — — —

a3 0.609 0.665 0.717 — — — — — —

a4 0.621 0.671 0.731 — — — — — —

b1 0.723 0.785 0.869 0.951 0.966 0.987 0.849 0.893 0.945

b2 0.720 0.792 0.881 0.892 0.930 0.981 0.817 0.872 0.946

s2 0.633 0.712 0.817 0.895 0.925 0.963 0.811 0.858 0.923

g 0.998 1.000 1.000 — — — — — —


and added a dummy variable (HOLD) equal to 1 if a bank is a member of a multi-bank holding company. See Aly et al. (1990) for definitions of the remainingvariables used in the second-stage regression.

We used more recent data, from the fourth-quarter FDIC Reports of Income andCondition (Call Reports) for 2002. We first took a random sample of 322 banks, asdid Aly et al. in their study. We also present results for the full sample (after deletingobservations with missing or implausible values) consisting of 6955 observations.

We first regressed the DEA efficiency estimates on our covariates to obtain theparameter estimates shown in the second column of Table 5. We then estimated 95%confidence intervals using the asymptotic normal approximation (see columns 3and 4) and our Algorithm #1 (see columns 5 and 6). Using Algorithm #2, we

regressedbbd on the covariates to obtain the parameter estimates shown in column 7 of

Table 5 and the confidence interval estimates in columns 8 and 9, also obtained withAlgorithm #2.

Several things become apparent from examining the results in Table 5. First, boththe parameter estimates and confidence-interval estimates obtained by regressing thebias-corrected efficiencies on covariates in Algorithm #2 are somewhat differentfrom those obtained from regressing the uncorrected estimates as in Algorithm #1.Given the Monte Carlo simulation results discussed earlier, this is not surprising, andthe simulation results suggest that we should prefer the results from Algorithm #2over those from Algorithm #1. The confidence interval estimates from eitherAlgorithm #1 or #2 are rather different from interval estimates obtained usingconventional methods, and this too is not surprising given the simulation resultsdiscussed earlier.

The conventional confidence interval estimates in columns 3–4 of Table 5 arecentered on the parameter estimates in column 2 by construction. It is interesting tonote that the intervals estimated with both Algorithms #1 and #2 sometimes do notcover the corresponding parameter estimate. In particular, with n ¼ 322, Algorithm#2 produces estimated confidence intervals in the last two columns of Table 5 that donot cover the corresponding parameter estimates in the seventh column in four cases(i.e., the coefficients on SIZE, SIZE_SQ, DIVERSE_SQ, and the estimate bse). Whenthe sample size is increased to 6955, the estimated confidence intervals fromAlgorithm #2 also do not cover the corresponding parameter estimate in four cases(HOLD, MSA, DIVERSE_SQ, and bse).12

Unlike the conventional confidence intervals based on the normal approximation,the bootstrap confidence intervals incorporate an implicit bias correction. Recall thatthe second-stage regression inherits the convergence rate of the DEA estimator in thefirst stage, i.e., n�2=ðpþqþ1Þ or n�2=9 in this application. Noting that 6955�2=9 � 51�1=2,

ARTICLE IN PRESS

12The confidence intervals estimated by Algorithms #1 and #2 in Table 5 are typically more narrow than

the corresponding interval estimates obtained by the usual normal approximation. In our Monte Carlo

experiments reported in Tables 1 and 2, the widths of estimated intervals were similar across the various

methods (with the exception of cases where tobit regression was used). The fact that the bootstrap intervals

in Table 5 are more narrow than the conventional intervals may reflect some unknown feature in the data

different from our simulation scenario.


ARTICLE IN PRESS

Table

5

Efficiency

ofU.S.commercialbanks(truncatedregression,95%

confidence

intervals,

p¼

3,

q¼

5)

b bNorm

alCI

Alg.#1CI

b bAlg.#2CI

lohi

lohi

lohi

n¼

322:

CONSTANT

0.6759

0.5234

0.8284

0.2841

0.4517

�0.2354

�1.354

�0.8535

SIZ

E0.0707

0.04904

0.09237

0.1047

0.1285

0.2453

0.3414

0.4117

DIV

ERSE

0.4293

0.323

0.5356

0.4044

0.5277

1.121

1.101

1.472

HOLD

0.01601

0.01194

0.02008

0.01468

0.01961

0.03738

0.03482

0.04967

MSA

0.006988

0.003636

0.01034

0.005876

0.009998

0.01897

0.01607

0.02883

SIZ

E_SQ

�0.003864

�0.004704

�0.003024

�0.006503

�0.005592

�0.01263

�0.02031

�0.01754

DIV

ERSE_SQ

�0.1181

�0.147

�0.08915

�0.158

�0.1239

�0.3536

�0.4928

�0.3906

SIZ

E�

DIV

ERSE

�0.01079

�0.01828

�0.003307

�0.01399

�0.00533

�0.02117

�0.0308

�0.00527

b s �0.06071

0.0596

0.06181

0.08192

0.08342

0.1673

0.2167

0.221

n¼

6955:

CONSTANT

�0.08591

�1.34

1.168

�1.228

�0.1798

�0.6978

�3.682

�0.003114

SIZ

E0.1477

�0.03172

0.3272

0.1493

0.3012

0.2291

0.1318

0.6233

DIV

ERSE

0.7302

�0.01542

1.476

0.7284

1.366

1.435

1.053

3.16

HOLD

0.02273

0.002241

0.04322

0.02294

0.04091

0.07018

0.07331

0.1333

MSA

0.01961

0.005769

0.03345

0.01934

0.03348

0.06714

0.07296

0.1168

SIZ

E_SQ

�0.006839

�0.01365

�2.346E�5

�0.0135

�0.007633

�0.01308

�0.03042

�0.01264

DIV

ERSE_SQ

�0.3357

�0.5252

�0.1463

�0.569

�0.4102

�0.8587

�1.571

�1.045

SIZ

E�

DIV

ERSE

�0.00636

�0.05713

0.04441

�0.02932

0.01443

0.02325

�0.02937

0.117

b s �0.03802

0.03225

0.0438

0.04996

0.05559

0.1106

0.1436

0.1599


our inference-making ability in the second-stage regression with 6955 observations isequivalent, in a rough sense, to making inference in an ordinary parametric,truncated regression (where the classical parametric convergence rate n�1=2 obtains)with about 51 observations. It is well known that maximum likelihood oftenproduces biased estimates in finite samples. Although we expect the procedure hereto be unbiased asymptotically, even with 6955 observations, we are far from theasymptotic result. And, of course, in any application such as this, the true DGP isunknown, and possibly different from the one that is assumed.

Finally, we note that there are some differences in parameter estimates obtainedwith the sub-sample and those obtained with the full sample. On the whole, however,the estimates appear rather stable. Of course, more data are always preferred to lessdata.

8. Summary and conclusions

The Monte Carlo results presented in the previous assumption illustrate a numberof problems with existing two-stage studies that employ non-parametric distancefunction estimators similar to (10) in the first stage, and then employ tobit regressionin the second stage with conventional inference based on the inverse negativeHessian of the log-likelihood function. As noted in Section 1, none of the publishedstudies that we are aware of have defined a DGP that might be estimated, and it isdifficult to imagine one where tobit regression would be sensible in this context. Interms of coverage of estimated confidence intervals, tobit regression is catastrophicin our Monte Carlo experiments.

Truncated regression estimates the correct model in our experiments. In terms ofcoverage of estimated confidence intervals, our single bootstrap is shown to performwell, but our double bootstrap performs even better, with little increase incomputational burden over the single bootstrap. The double bootstrap has theadditional advantage that RMSE of the intercept and slope estimators in the second-stage regression declines more rapidly with increasing sample size than when thesingle bootstrap is used, resulting in lower RMSE at moderate sample sizesdepending on model dimensionality in the first stage. The double bootstrap is thusour preferred choice, but one could use both methods as a robustness check.

Acknowledgements

We are grateful to Irene Gijbels and participants at the XXXIVemes Journees deStatistique, Societe Franc-aise de Statistique, Bruxelles, the North AmericanProductivity Workshop, Albany, New York; the Workshop on QuantitativeMethods for the Measurement of Organizational Efficiency, Institute for FiscalStudies, University College, London, and the Econometric Society EuropeanMeetings, Madrid, for comments on earlier versions. Research support from ‘‘Projetd’Actions de Recherche Concertees’’ (No. 98/03-217), from the ‘‘Inter-university

ARTICLE IN PRESS


Attraction Pole’’, Phase V (No. P5/24) from the Belgian Government, and from theTexas Advanced Computation Center is gratefully acknowledged. Any remainingerrors are solely our responsibility.

Appendix A. Technical details

A.1. Censored versus truncated regression

Consider the regression model

Wi ¼ zibþ ei, (27)

where eiNð0;s2e Þ is identically, independently distributed for all i ¼ 1; . . . ; n. Theleft-hand side variable W is said to be censored when, instead of observing Wi for allobservations, we observe

yi ¼zi þ ei if zi þ ei4ci

ci otherwise.

((28)

In this case, W is left-censored at the constant ci, which may vary over observations.Alternatively, Wi is said to be truncated if we observe yi ¼ Wi for all WiXci, butobserve nothing otherwise.

Relative to the classical linear regression model with unbounded error terms,both censoring and truncation involve a loss of information about the depen-dent variable. In the case of censoring, some information is lost about the depen-dent variable for censored observations, but the right-hand side variables areobserved for such observations. In the case of truncation, neither the left-handnor the right-hand side variables are observed for some observations. Theinformation loss is thus more severe in the case of truncation than in the case ofcensoring.

In the case of truncation, if the Wi are assumed normal with left-truncation at ci, b

in (27) can be estimated by maximizing the likelihood function

L1 ¼Yn

i¼1

1

sef

yi � zib

se

� �1� F

ci � zib

se

� �� 1, (29)

where fð�Þ and Fð�Þ represent the standard normal density and distributionfunctions, respectively. This resembles the likelihood for regression models withnormal errors that are neither censored nor truncated, except for the term in squarebrackets. Division by this term is necessary to re-scale the normal density fð�Þ so thatit integrates to unity after truncation.

In the case of left-censoring at ci where one assumes that the uncensoredobservations are normally distributed, b in (27) can be estimated by specifying afunctional form for ProbðWi4cijaÞ, where a is a vector of parameters, and writing the

ARTICLE IN PRESS


likelihood function as

L2 ¼Y

ijyi4ci

1

sef

yi � zib

se

� �1� F

ci � zib

se

� �� 1� ProbðWi4cijaÞ

�Y

ijyi¼ci

½1� ProbðWi4cijaÞ�. ð30Þ

Here, the probability that Wi4ci is allowed to result from a process different from theone that determines W when Wi4ci. The probability ProbðWi4cijaÞ would typically bespecified as probit or logit, but many other specifications are possible.

The tobit model is a special case of (30), where the probability that Wi4ci isassumed to be controlled by the same process that determines W when Wi4ci. In otherwords, the tobit model incorporates the additional assumption that

ProbðWi4ciÞ ¼ 1� Fci � zib

se

� �� . (31)

Substitution of the expression in (31) for ProbðWi4cijaÞ in (30) yields the tobitlikelihood function,

L3 ¼Y

ijyi4ci

1

sef

yi � zib

se

� � Yijyi¼ci

Fci � zib

se

� �. (32)

The terms in the second product expression reflect the fact that all that is knownabout the censored observations is that Wi lies below ci.

Even if the censored model was the correct specification, use of the tobit version ofthis model would be curious in the context of the two-stage models discussed herein.Note that the likelihood L2 in (30) is multiplicatively separable in b and a. Both thecensored as well as the uncensored observations identify a, while only the uncensoredobservations identify b. In fact, the separability means that L2 can be maximizedwith respect to each parameter vector independently; maximization with respect to b

is equivalent to maximization of the likelihood L1 in (29) for the truncated model.By contrast, when the tobit specification is used, both the censored as well as theuncensored observations determine bb when the likelihoodL3 in (32) is maximized. Abetter approach would be to specify ProbðWi4cijaÞ as a probit probability, maximizeboth L2 and L3, and perform a likelihood-ratio test of the null hypothesis b ¼ a.None of the two-stage studies that have used the tobit specification in the secondstage that we are aware of have done this. Of course, the censored model is not thecorrect specification, so perhaps the point is moot.

Both the truncated and the tobit regression models can easily be estimated bymaximum likelihood. In either case, the task is made easier when Olsen’s (1978) re-parameterization is used. With this re-parameterization, maximization by Newton’smethod typically converges very quickly, requiring only a few iterations. For thepractitioner, commands that perform truncated regression are available in a numberof popular software packages.

ARTICLE IN PRESS


A.2. Obtaining draws from a left-truncated normal distribution

Both bootstrap algorithms (Algorithm #1 and #2), as well as the simulationalgorithm in Section 5 (Algorithm #3), require iid draws from a Nð0;s2Þ distributionwith left truncation at a constant c. This can be accomplished quickly and easilyusing a modified transformation method. Let Fð�Þ and F�1ð�Þ denote the standardnormal distribution function and the standard normal quantile function, respec-tively, so that u ¼ F�1ðFðuÞÞ. Generate v be uniform on ð0; 1Þ, let c0 ¼ c=s, and setv0 ¼ Fðc0Þ þ ½1� Fðc0Þ�v. Then compute u ¼ sF�1ðv0Þ to obtain the desired left-truncated normal deviate.

References

Adams, R.M., Berger, A.N., Sickles, R.C., 1999. Semiparametric approaches to stochastic panel frontiers

with applications in the banking industry. Journal of Business and Economic Statistics 17, 349–358.

Aigner, D.J., Lovell, C.A.K., Schmidt, P., 1977. Formulation and estimation of stochastic frontier

production function models. Journal of Econometrics 6, 21–37.

Aly, H.Y., Grabowski, R., Pasurka, C., Rangan, N., 1990. Technical, scale, and allocative efficiencies in

U.S. banking: an empirical investigation. Review of Economics and Statistics 72, 211–218.

Andersen, P., Petersen, N.C., 1993. A procedure for ranking efficient units in Data Envelopment Analysis.

Management Science 39, 1261–1264.

Arnold, V.L., Bardhan, I.R., Cooper, W.W., Kumbhakar, S.C., 1996. New uses of DEA and statistical

regressions for efficiency and estimation: Texas schools. Annals of Operations Research 66, 255–277.

Atkinson, S.E., Primont, D., 2002. Stochastic estimation of firm technology, inefficiency, and productivity

growth using shadow cost and distance functions. Journal of Econometrics 108, 203–225.

Banker, R.D., Johnston, H.H., 1994. Evaluating the impacts of operating strategies on efficiency in the US

airline industry. In: Charnes, A., Cooper, W.W., Levin, A.Y., Seiford, L.M. (Eds.), Data Envelopment

Analysis: Theory Methodology and Application. Kluwer Academic Publishers, Inc., Boston,

pp. 97–128.

Barros, C.P., 2004. Measuring performance in defence-sector companies in a small NATO member-

country. Journal of Economic Studies 31, 112–128.

Battese, G.E., Coelli, T.J., 1995. A model for technical inefficiency effects in a stochastic production

function for panel data. Empirical Economics 20, 325–332.

Berger, A.N., Mester, L.J., 2003. Explaining the dramatic changes in performance of US banks:

technological change deregulation and dynamic changes in competition. Journal of Financial

Intermediation 12, 57–95.

Bickel, P.J., Freedman, D.A., 1981. Some asymptotic theory for the bootstrap. Annals of Statistics 9,

1196–1217.

Binam, J.N., Sylla, K., Diarra, I., Nyambi, G., 2003. Factors affecting technical efficiency among coffee

farmers in Cote d’Ivoire: evidence from the centre west region. R&D Management 15, 66–76.

Burgess, J.F., Wilson, P.W., 1998. Variation in inefficiency among US hospitals. Canadian Journal of

Operational Research and Information Processing (INFOR) 36, 84–102.

Byrnes, P., Fare, R., Grosskopf, S., Lovell, C.A.K., 1988. The effect of unions on productivity: U.S.

surface mining of coal. Management Science 34, 1037–1053.

Carrington, R., Puthucheary, N., Rose, D., Yaisawarng, S., 1997. Performance measurement in

government service provision: the case of police services in New South Wales. Journal of Productivity

Analysis 8, 415–430.

Chakraborty, K., Biswas, B., Lewis, W.C., 2001. Measurement of technical efficiency in public education:

a stochastic and nonstochastic production approach. Southern Economic Journal 67, 889–905.

Chalfant, J.A., Gallant, A.R., 1985. Estimating substitution elasticities with the Fourier cost function.

Journal of Econometrics 28, 205–222.

ARTICLE IN PRESS


Charnes, A., Cooper, W.W., Rhodes, E., 1978. Measuring the inefficiency of decision making units.

European Journal of Operational Research 2, 429–444.

Charnes, A., Cooper, W.W., Rhodes, E., 1979. Measuring the efficiency of decision making units.

European Journal of Operational Research 3, 339.

Cheng, T.W., Wang, K.L., Weng, C.C., 2000. A study of technical efficiencies of CPA firms in Taiwan.

Review of Pacific Basin Financial Markets and Policies 3, 27–44.

Chilingerian, J.A., 1995. Evaluating physician efficiency in hospitals: a multivariate analysis of best

practices. European Journal of Operational Research 80, 548–574.

Chilingerian, J.A., Sherman, H.D., 2004. Health care applications: from hospitals to physicians from

productive efficiency to quality frontiers. In: Cooper, W.W., Seiford, L.M., Zhu, J. (Eds.), Handbook

on Data Envelopment Analysis. Kluwer Academic Publishers, Boston, pp. 265–298 (Chapter 10).

Chirkos, T.N., Sears, A.M., 1994. Technical efficiency and the competitive behavior of hospitals. Socio-

Economic Planning Science 28, 219–227.

Chu, H.L., Liu, S.Z., Romeis, J.C., Yaung, C.L., 2003. The initial effects of physician compensation

programs in Taiwan hospitals: implications for staff model HMOs. Health Care Management Science

6, 17–26.

Coelli, T. 1996. A guide to FRONTIER version 4.1: a computer program for stochastic frontier

production and cost function estimation. Unpublished Working Paper, Department of Econometrics,

University of New England, Armidale, Australia

Coelli, T., 2000. On the econometric estimation of the distance function representation of a production

technology. Discussion Paper No. 00/42, Center for Operations Research and Econometrics (CORE),

Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.

Coelli, T., Perelman, S., 2001. Medicion de la Eficiencia Tcnica en Contextos Multiproducto. In: Alvarez

Pinilla, A. (Ed.), La Medicion de la Eficiencia Productividad. Editorial Piramide, Madrid.

Coelli, T., Rao, D.S.P., Battese, G.E., 1998. An Introduction to Efficiency and Productivity Analysis.

Kluwer Academic Publishers, Inc., Boston.

De Borger, B., Kerstens, K., 1996. Cost efficiency of Belgian local governments: a comparative analysis of

FDH DEA and econometric approaches. Regional Science and Urban Economics 26, 145–170.

Dietsch, M., Weill, L., 1999. Les performances des banques de depots franc-aises: une evaluation par la

method DEA. In: Badillo, P.Y., Paradi, J.C. (Eds.), La Method DEA. Hermes Science Publications,

Paris.

Dusanksy, R., Wilson, P.W., 1994. Technical efficiency in the decentralized care of the developmentally

disabled. Review of Economics and Statistics 76, 340–345.

Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall, London.

Fare, R., 1988. Fundamentals of Production Theory. Springer, Berlin.

Fare, R., Grosskopf, S., Lovell, C.A.K., 1985. The Measurement of Efficiency of Production. Kluwer-

Nijhoff Publishing, Boston.

Farrell, M.J., 1957. The measurement of productive efficiency. Journal of the Royal Statistical Society,

Series A 120, 253–281.

Fried, H.O., Lovell, C.A.K., Vanden Eeckaut, P., 1993. Evaluating the performance of U.S. Credit

Unions. Journal of Banking and Finance 17, 251–265.

Fried, H.O., Lovell, C.A.K., Yaisawarng, S., 1999a. The impact of mergers on credit union service

provision. Journal of Banking and Finance 23, 367–386.

Fried, H.O., Schmidt, S.S., Yaisawarng, S., 1999b. Incorporating the operating environment into a

nonparametric measure of technical efficiency. Journal of Productivity Analysis 12, 249–267.

Fried, H.O., Lovell, C.A.K., Schmidt, S.S., Yaisawarng, S., 2002. Accounting for environmental effects

and statistical noise in data envelopment analysis. Journal of Productivity Analysis 17, 157–174.

Garden, K.A., Ralston, D.E., 1999. The x-efficiency and allocative efficiency effects of credit union

mergers. Journal of International Financial Markets, Institutions and Money 9, 285–301.

Gattoufi, S., Oral, M., Reisman, A., 2004. Data envelopment analysis literature: a bibliography update

(1951–2001). Socio-Economic Planning Sciences 38, 159–229.

Gijbels, I., Mammen, E., Park, B.U., Simar, L., 1999. On estimation of monotone and concave frontier

functions. Journal of the American Statistical Association 94, 220–228.

ARTICLE IN PRESS


Gillen, D., Lall, A., 1997. Developing measures of airport productivity and performance: an application of

data envelopment analysis. Transportation Research Part E 33, 261–272.

Gonzalez, B., Barber, P., 1996. Changes in the efficiency of Spanish public hospitals after the introduction

of program-contracts. Investigaciones Economicas 20, 377–402.

Guilkey, D.K., Lovell, C.A.K., Sickles, R.C., 1983. A comparison of the performance of three flexible

functional forms. International Economic Review 24, 591–616.

Hall, P., 1986. On the number of bootstrap simulations required to construct a confidence interval. The

Annals of Statistics 14, 1453–1462.

Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer, New York.

Hirschberg, J.G., Lloyd, P.J., 2002. Does the technology of foreign-invested enterprises spill over to other

enterprises in China? An application of post-DEA bootstrap regression analysis. In: Lloyd, P.J., Zang,

X.G. (Eds.), Modelling the Chinese Economy. Edward Elgar Press, London.

Honore, B.E., Powell, J.L., 1994. Pairwise difference estimators of censored and truncated regression

models. Journal of Econometrics 64, 241–278.

Huang, C.J., Liu, J.T., 1994. Estimation of a non-neutral stochastic frontier production function. Journal

of Productivity Analysis 5, 171–180.

Isik, I., Hassan, M.K., 2002. Technical, scale, and allocative efficiencies of Turkish banking industry.

Journal of Banking and Finance 26, 719–766.

Kaligajan, K.P., Shand, R.T., 1985. Types of education and agricultural productivity. Journal of

Development Studies 21, 222–245.

Kirjavainen, T., Loikkanen, H.A., 1998. Efficiency differences of Finnish senior secondary schools: an

application of DEA and tobit analysis. Economics of Education Review 17, 377–394.

Kneip, A., Park, B.U., Simar, L., 1998. A note on the convergence of nonparametric DEA estimators for

production efficiency scores. Econometric Theory 14, 783–793.

Kneip, A., Simar, L., Wilson, P.W., 2003. Asymptotics for DEA estimators in non-parametric frontier

models, Discussion Paper No. 0317, Institut de Statistique, Universite Catholique de Louvain,

Louvain-la-Neuve, Belgium.

Kooreman, P., 1994. Nursing home care in The Netherlands: a nonparametric efficiency analysis. Journal

of Health Economics 13, 301–316.

Korostelev, A., Simar, L., Tsybakov, A.B., 1995. On estimation of monotone and convex boundaries.

Publications des Instituts de Statistique des Universites de Paris 39, 3–18.

Kumbhakar, S.C., Ghosh, S., McGuckin, J.T., 1991. A generalized production frontier approach for

estimating determinants of inefficiency in U.S. dairy farms. Journal of Business and Economic

Statistics 9, 279–286.

Lewbel, A., Linton, O., 2002. Nonparametric censored and truncated regression. Econometrica 70, 765–779.

Lovell, C.A.K., Walters, L.C., Wood, L.L., 1994. Stratified models of education production using

modified DEA and regression analysis. In: Charnes, A., Cooper, W.W., Lewin, A.Y., Seiford, L.M.

(Eds.), Data Envelopment Analysis: Theory, Methodology, and Applications. Kluwer Academic

Publishers, Boston.

Luoma, K., Jarvio, M.-L., Suoniemi, I., Hjerppe, R.T., 1996. Financial incentives and productive

efficiency in Finnish health services. Health Economics 5, 435–445.

Maddala, G.S., 1988. Introduction to Econometrics. Macmillan Publishing Co., Inc, New York.

McCarty, T.A., Yaisawarng, S., 1993. Technical efficiency in New Jersey school districts. In: Fried, H.O.,

Lovell, C.A.K., Schmidt, S.S. (Eds.), The Measurement of Productive Efficiency. Oxford University

Press, New York.

McMillan, M.L., Datta, D., 1998. The relative efficiencies of Canadian universitites: a DEA perspective.

Canadian Public Policy—Analyse de Politiques 24, 485–511.

Meeusen, W., van den Broeck, J., 1977. Efficiency estimation from Cobb–Douglas production functions

with composed error. International Economic Review 18, 435–444.

Mukherjee, K., Ray, S.C., Miller, S.M., 2001. Productivity growth in large US commercial banks: the

initial post-deregulation experience. Journal of Banking and Finance 25, 913–939.

Nyman, J.A., Bricker, D.L., 1989. Profit incentives and technical efficiency in the production of nursing

home care. Review of Economics and Statistics 71, 586–594.

ARTICLE IN PRESS


O’Donnell, C.J., van der Westhuizen, G., 2002. Regional comparisons of banking performance in South

Africa. South African Journal of Economics 70, 485–518.

Okeahalam, C.C., 2004. Foreign ownership, performance and efficiency in teh banking sector in Uganda

and Botswana. Journal for Studies in Economics and Econometrics 28, 89–118.

Olsen, R., 1978. A note on the uniqueness of the maximum likelihood estimator in the tobit model.

Econometrica 46, 1211–1215.

Otsuki, T., Hardle, I.W., Reis, E.J., 2002. The implications of property rights for joint agriculture-timber

productivity in the Brazilian Amazon. Environment and Development Economics 7, 299–323.

Pitt, M.M., Lee, L.F., 1981. The measurement and sources of technical inefficiency in the Indonesian

weaving industry. Journal of Development Economics 9, 43–64.

Puig-Junoy, J., 1998. Technical efficiency in the clinical management of critically ill patients. Health

Economics 7, 263–277.

Raczka, J., 2001. Explaining the performance of heat plants in Poland. Energy Economics 23,

355–370.

Ralston, D., Wright, A., Garden, K., 2001. Can mergers ensure the survival of credit unions in the third

millennium? Journal of Banking and Finance 25, 2277–2304.

Ray, S.C., 1988. Data envelopment analysis nondiscretionary inputs and efficiency: an alternative

interpretation. Socio-Economic Planning Science 22, 167–176.

Ray, S.C., 1991. Resource-use efficiency in public schools: a study of Connecticut data. Management

Science 37, 1620–1628.

Ray, S.C., 2004. Data Envelopment Analysis: Theory and Techniques for Economics and Operations

Research. Cambridge University Press, Cambridge.

Reifshneider, D., Stevenson, R., 1991. Systematic departures from the frontier: a framework for the

analysis of firm inefficiency. International Economic Review 32, 715–723.

Resende, M., 2000. Regulatory regimes and efficiency in US local telephony. Oxford Economic Papers 52,

447–470.

Rhodes, E.L., Southwick Jr., L., 1993. Variations in public and private university efficiency. In: Rhodes,

E.L. (Ed.), Applications of Management Science, vol. 7. JAI Press, Inc., Greenwich, CT.

Rosko, M.D., Chilingerian, J.A., Zinn, J.S., Aaronson, W.E., 1995. The effects of ownership operating

environment, and strategic choices on nursing home efficiency. Medical Care 33, 1001–1021.

Ruggiero, J., 2004. Performance evaluation in education: modeling educational production. In: Cooper,

W.W., Seiford, L.M., Zhu, J. (Eds.), Handbook on Data Envelopment Analysis. Kluwer Academic

Publishers, Boston, pp. 265–298 (Chapter 10).

Sexton, T.R., Sleeper, S., Taggart Jr., R.E., 1994. Improving pupil transportation in North Carolina.

Interfaces 24, 87–103.

Sheather, S.J., Jones, M.C., 1991. A reliable data-based bandwidth selection method for kernel density

estimation. Journal of the Royal Statistical Society B 53, 684–690.

Shephard, R.W., 1970. Theory of Cost and Production Function. Princeton University Press, Princeton.

Sickles, R.C., Good, D.H., Getachew, L., 2002. Specification of distance functions using semi- and

nonparametric methods with an application to the dynamic performance of eastern and western

European air carriers. Journal of Productivity Analysis 17, 133–155.

Simar, L., Wilson, P.W., 1998. Sensitivity analysis of efficiency scores: how to bootstrap in nonparametric

frontier models. Management Science 44 (11), 49–61.

Simar, L., Wilson, P.W., 1999a. Some problems with the Ferrier/Hirschberg bootstrap idea. Journal of

Productivity Analysis 11, 67–80.

Simar, L., Wilson, P.W., 1999b. Of course we can bootstrap DEA scores! But does it mean anything?

Logic trumps wishful thinking. Journal of Productivity Analysis 11, 93–97.

Simar, L., Wilson, P.W., 2000a. A general methodology for bootstrapping in nonparametric frontier

models. Journal of Applied Statistics 27, 779–802.

Simar, L., Wilson, P.W., 2000b. Statistical inference in nonparametric frontier models: the state of the art.

Journal of Productivity Analysis 13, 49–78.

Simar, L., Wilson, P.W., 2001a. Testing restrictions in nonparametric efficiency models. Communications

in Statistics 30, 159–184.

ARTICLE IN PRESS


Simar, L., Wilson, P.W., 2001b. Aplicacion del bootstrap para estimadores D.E.A. In: Alvarez Pinilla, A.

(Ed.), La Medicion de la Eficiencia y la Productividad. Piramide, Madrid.

Simar, L., Wilson, P.W., 2004. Performance of the bootstrap for DEA estimators and iterating the

principle. In: Cooper, W.W., Seiford, L.M., Zhu, J. (Eds.), Handbook on Data Envelopment Analysis.

Kluwer Academic Publishers, Boston, pp. 265–298 (Chapter 10).

Stanton, K.R., 2002. Trends in relationship lending and factors affecting relationship lending efficiency.

Journal of Banking and Finance 26, 127–152.

Turner, H., Windle, R., Dresner, M., 2004. North American containerport productivity: 1984–1997.

Transportation Research Part E 40, 339–356.

Wang, K.L., Tseng, Y.T., Weng, C.C., 2003. A study of production efficiencies of integrated securities

firms in Taiwan. Applied Financial Economics 13, 159–167.

Wheelock, D.C., Wilson, P.W., 2000. Why do banks disappear? The determinants of US bank failures and

acquisitions. Review of Economics and Statistics 82, 127–138.

Wheelock, D.C., Wilson, P.W., 2001. New evidence on returns to scale and product mix among US

commercial banks. Journal of Monetary Economics 47, 653–674.

Wilson, P.W., 2003. Testing independence in models of productive efficiency. Journal of Productivity

Analysis 20, 361–390.

Wilson, P.W., Carey, K., 2004. Nonparametric analysis of returns to scale and product mix among U.S.

hospitals. Journal of Applied Econometrics 19, 505–524.

Worthington, A.C., Dollery, B.E., 2000. Productive efficiency and the Australian local government grants

process. Australian Journal of Regional Studies 6, 95–121.

Wu, C.F.J., 1986. Jackknife bootstrap and other resampling methods in regression analysis. Annals of

Statistics 14, 1261–1295.

Xue, M., Harker, P.T., 1999. Overcoming the inherent dependency of DEA efficiency scores: a bootstrap

approach. Unpublished Working Paper, Wharton Financial Institutions Center, University of

Pennsylvania.

ARTICLE IN PRESS


Estimation and inference in two-stage, semi-parametric...

Documents

Transcript of Estimation and inference in two-stage, semi-parametric...