Sensitivity and dimensionality tests of DEA efficiency scores
-
Upload
andrew-hughes -
Category
Documents
-
view
218 -
download
5
Transcript of Sensitivity and dimensionality tests of DEA efficiency scores
European Journal of Operational Research 154 (2004) 410–422
www.elsevier.com/locate/dsw
Sensitivity and dimensionality tests of DEA efficiency scores
Andrew Hughes a, Suthathip Yaisawarng b,*
a NSW Treasury, Level 26, Governor Macquarie Tower, 1 Farrer Place, Sydney 2000, Australiab Department of Economics, Union College, Schenectady, NY 12308, USA
Abstract
Extending Hughes and Yaisawarng [J.L.T. Blank (Ed.), Public Provision and Performance: Contributions from
Efficiency and Productivity Measurement, Elsevier Science B.V., The Netherlands, p. 277], this paper addresses issues of
dimensionality for enhancing the credibility of data envelopment analysis (DEA) results in practical applications and
assisting practitioners in making an appropriate selection of variables for a DEA model. The paper develops a
structured approach to testing whether changing the number of model variables for a fixed sample size affects DEA
results. Using a simulation method, the paper tests whether a calculated DEA efficiency score reflects the dimension of
the model or the existence of inefficiency. An empirical illustration is included.
� 2003 Elsevier B.V. All rights reserved.
Keywords: Data envelopment analysis; Sensitivity tests; Dimensionality tests
1. Introduction
Following a seminal work by Farrell (1957)on productive efficiency measurement and work
by Charnes et al. (1978), data envelopment ana-
lysis (DEA) has become a popular empirical
method for measuring efficiency. (See Cooper et al.
(2000) for a comprehensive list of DEA stud-
ies.) This method assigns an efficiency score to
each operational unit based on how well it
* Corresponding author. Tel.: +1-518-388-6606; fax: +1-518-
388-6988.
E-mail addresses: [email protected]
(A. Hughes), [email protected] (S. Yaisawarng).
0377-2217/$ - see front matter � 2003 Elsevier B.V. All rights reserv
doi:10.1016/S0377-2217(03)00178-4
transforms a given set of inputs into outputs,
relative to the best performers in the sample. Due
to the nature of the technique, several factorsincluding the relationship between sample size
and number of model variables may affect DEA
results. For example, adding more model vari-
ables for a given sample size potentially yields
higher efficiency scores for units in the sample.
In the same way, fewer units in a sample for a
given number of model variables may lead to
higher efficiency scores. This is a dimensionalityissue.
One way to address the issue of dimensions is
to expand the sample size using pooled cross
section and time series data. This method as-
sumes no technological change over the sample
periods (an assumption that can be problematic
for industries undergoing substantial technical
ed.
A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422 411
innovation). Another approach, proposed byNunamaker (1985), is to include alternative sets
of variables for a fixed sample size and to search
for the best dimensions for management to pur-
sue.
Another important issue is the selection of
variables for a model. Given imperfect data, re-
searchers are often required to make tradeoffs in
selecting input and output variables. A smallnumber of studies (see, for example, Valdmanis,
1992) calculate efficiency scores for a sample using
several alternative sets of variables and compare
the mean values of the results. These studies
focus on sensitivity testing of average efficiency
scores relative to a frontier constructed from
the observed sample, rather than of individual
scores and their ranks. Other methods focus onsensitivity testing, for which frontiers are simu-
lated using re-sampling techniques including
bootstrapping and jackknifing, to construct a
confidence interval for each DEA efficiency score.
(See, for example, Grosskopf and Yaisawarng,
1990; Ferrier et al., 1993.) Grosskopf (1996) pro-
vides a brief review of approaches to sensitivity
testing including a statistical test of DEA results.Hughes and Yaisawarng (2000) perform sev-
eral sensitivity tests of their DEA efficiency
scores obtained from four model specifications.
None of these studies, however, provide a test of
the impact of the dimensions of a DEA model on
results.
DEA results should be independent of the di-
mension of the DEA model and robust across al-ternative proxy variables, if the empirical results
are to be used for devising appropriate policy
recommendations. To date, there has been no
systematic testing of the dimensionality of the
DEA efficiency scores relative to variable specifi-
cations.
This paper outlines an empirical procedure for
DEA studies in situations where researchers arefaced with a set of alternative proxy measures for a
variable, which are not perfect substitutes for each
other. We examine various models using different
sets of potential variables for a fixed sample size
and develop a systematic method for testing the
dimensionality of DEA results across alternative
model specifications. Specifically, the paper applies
a simulation method to test whether the number ofmodel variables or the magnitude of inefficiency
affects a calculated efficiency score. The goal of
this exercise is to improve the process for selecting
a suitable DEA model. The paper provides a rig-
orous empirical demonstration of the proposed
procedure using a sample of 161 police patrols in
the State of New South Wales (NSW), Australia,
in 1995–1996.The remainder of the paper is organised as
follows. Section 2 presents a standard input-
oriented DEA model and discusses sensitivity and
dimensionality tests. Section 3 demonstrates the
application of the proposed procedures for a
sample of NSW police patrols in 1995–1996.
Section 4 concludes the paper.
2. Methodology
This section consists of three subsections. Sec-
tion 2.1 introduces a standard input-oriented DEA
model and discusses the effects of additional model
variables on DEA computed efficiency scores. It
also discusses the effects of different types offrontier technology on efficiency scores. Section
2.2 discusses the sensitivity tests of DEA results for
a fixed sample size. Results of the tests can be used
to select a preferred model. Section 2.3 discusses a
procedure for testing whether DEA computed ef-
ficiency scores reflect the existence of inefficiency
or the dimension of the DEA model using a sim-
ulation method.
2.1. A standard input-oriented DEA model
Consider an organisation consisting of several
units performing similar tasks. Suppose there are Junits in the organisation. Each unit uses N in-
puts to produce M outputs. The objective of
each unit is to minimise its inputs used to pro-duce a predetermined level of outputs. Let ymjand xnj be the quantity of output m produced by
unit j, m ¼ 1; . . . ;M , and the quantity of input nused by unit j, n ¼ 1; . . . ;N , respectively. An
input-oriented DEA model used to calculate the
412 A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422
efficiency score for unit k is formulated as fol-lows. 1
TEk ¼ min k
s:t: y11z1 þ y12z2 þ � � � þ y1JzJ P y1k;
..
.
yM1z1 þ yM2z2 þ � � � þ yMJzJ P yMk;
x11z1 þ x12z2 þ � � � þ x1JzJ 6 kx1k;
..
.
xN1z1 þ xN2z2 þ � � � þ xNJzJ 6 kxNk;
z1 þ z2 þ � � � þ zJ ¼ 1;
z1 P 0; z2 P 0; . . . ; zJ P 0;
ð1Þwhere zj, j ¼ 1; . . . ; J, are weights or the relative
impact of unit j on the target point for unit k.
There are ðM þN þ 1Þ constraints in this LP
problem. The first M constraints are the output
constraints, one for each output. A constraint on
output m indicates that the linear combination ofoutput m produced by all J units in the sample
must be at least as large as output m produced by
unit k, the unit being assessed. The next N con-
straints are the input constraints. In this case, a
constraint on input n indicates that the linear
combination of input n used by all J units in the
sample must be less than or equal to a k times unit
1 Charnes et al. (1978) formulate a DEA model as a
fractional linear programming (LP) problem, i.e.,
TEk ¼ max u1y1k þ u2y2k þ � � � þ uMyMk ;
s:t: v1x1k þ v2x2k þ � � � þ vNxNk ¼ 1;
u1y1j þ u2y2j þ � � � þ uMyMj
� ðv1x1j þ v2x2j þ � � � þ vNxNjÞ6 0;
j ¼ 1; . . . ; J ; j 6¼ k; um � 0; m ¼ 1; . . . ;M;
vn P 0; n ¼ 1; . . . ;N :
The solution to this problem gives a technical efficiency score
for unit k as well as the set of output and input weights, relative
to a constant returns to scale (CRS) technology. The LP for-
mulation can be modified for a variable returns to scale (VRS)
technology as suggested by Banker et al. (1984). The modified
LP formulation is dual to that of (1). For further details, see
F€aare et al. (1985, 1994), Lovell (1993) or Steering Committee
for the Review of Commonwealth/State Service Provision
(1997).
k�s current inputs, where k is a constant numberthat satisfies all constraints. The smallest value of kis the technical efficiency score of unit k. The last
constraint restricts the sum of the relative impacts
to unity. This requirement permits the technology
or the production frontier to have VRS, i.e., CRS
in some output range, increasing returns or de-
creasing returns in other output ranges. 2 For de-
tails of this restriction, see Afriat (1972).The solution values from the LP problem in (1)
are the technical efficiency score for unit k and the
values of relative impacts zj. When zj > 0, unit j is
an efficient peer for unit k. When zj ¼ 0, unit j is
not an efficient peer for unit k. The LP problem to
compute efficiency scores for the remaining units
in the sample can be formulated in a similar
fashion as found in (1). The only difference is thatthe values of outputs and inputs for unit k on the
right-hand-side of the constraints are replaced
with the outputs and inputs for the unit consid-
ered. To calculate efficiency scores for all J units
in the sample, the LP problem is solved J times;
once for each unit.
Grosskopf (1986) shows that a technical effi-
ciency score relative to a VRS frontier is at least aslarge as a technical efficiency score of the same unit
relative to a CRS frontier, given the same number
of model variables. This is because the VRS tech-
nology envelops the data points more tightly than
the CRS frontier. As a consequence, there will be
more efficient units on the VRS frontier compared
to the number of efficient units on the CRS fron-
tier. This result may be generalised to an increasein the number of constraints in a DEA model. For
any fixed sample size, more constraints in the DEA
formulation, in particular imposing a flexible
frontier technology, increasing the number of in-
puts, and/or increasing the number of outputs,
result in higher efficiency scores and more efficient
units. 3 This generalisation is consistent with
2 The unit restriction on z�s is removed if the frontier
technology exhibits CRS, i.e., a proportional increase in all
inputs leads to the same proportional increase in all outputs.3 Sample sizes also have a similar effect on DEA computed
efficiency scores. For a given set of variables and frontier
technology, the proportion of efficient units in a small sample is
higher than that of a larger sample.
A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422 413
Nunamaker�s assertion (1985) that an efficient unitfrom a DEA model with a lower number of inputs
remains efficient in the model with additional in-
puts. Thrall (1989) provides a transition theorem
to reinforce Nunamaker�s proposition under a
fixed sample size but with increasing numbers
of inputs and outputs.
2.2. Sensitivity tests
Researchers are often confronted with a range
of variables that reflect the activities of the or-
ganisation under consideration. These variables
are seldom ideal. Each variable may not fully
capture all aspects of the input or output under
consideration. Due to the nature of DEA model-
ling, adding more variables not only inflates DEAefficiency scores but it also potentially conceals the
actual magnitude of inefficiency. Researchers need
to pay close attention to the selection of variables
for a DEA model. A selection of suitable variables
sometimes cannot be justified on theoretical
grounds. In addition, there is a trade-off between a
marginally improved DEA model specification
(from an increase in the number of variables) andpotentially inflating DEA efficiency scores. It is
recommended that different model specifications
be used and that the results be tested empirically.
Hughes and Yaisawarng (2000) perform several
sensitivity tests of DEA results across specifica-
tions for a fixed sample size. Their procedures
focus on three technical aspects: (1) the overall
relationship, (2) variations of individual scores,and (3) an assessment of appropriate efficient
peers. The authors use objective measures such as
statistical techniques to capture the first two as-
pects and subjective measures such as qualitative
analysis and expert opinions to address the third
aspect. We summarise their procedures for a fixed
sample size of J units and s model specifications
below.
2.2.1. Overall relationship
(a) Between each pair of efficiency scores. This
test concentrates on the ranking of units based on
their efficiency scores, rather than the magnitude
of the efficiency scores. The rationale is that the
position of a unit or its rank in the sample is in-
dependent of the dimension of the DEA modeland that the efficiency rankings in any two models
should be the same, if the results are robust or
insensitive to the model specification. To test the
sensitivity of the DEA results between each pair of
the model specifications, the null hypothesis is that
no association between each pair of efficiency
ranks exists. The test statistic is the Spearman rank
correlation coefficient, which takes the value be-tween )1 and +1. The Spearman rank correlation
coefficient would be statistically significant for
the DEA results that are not sensitive between the
two models.
(b) Across all s specifications. This test looks at
the distribution of the efficiency rankings. The null
hypothesis is that the probability distributions of
efficiency rankings are the same across s specifica-tions. This hypothesis is tested using the Friedman
non-parametric test that not only requires no as-
sumption regarding the distribution of efficiency
rankings but also accounts for the possibility that
efficiency scores from each model specification
are not independent across s specifications. The
Friedman test-statistic approximates a Chi-square
distribution with s� 1 degrees of freedom, where sis number of specifications. A failure to reject the
null hypothesis verifies the robustness of the results.
2.2.2. Variations of individual scores across sspecifications
For this test, the attention is the magnitude of
the efficiency scores. Although efficiency scores
depend on the number of variables included in theDEA model, it is important to analyse the varia-
tion in the actual efficiency scores for individual
units in a sample across s specifications to gain
better understanding of causes of variations, if the
result is indeed sensitive to the model selection.
The difference between the maximum and the
minimum efficiency score for each unit across sspecifications is the range statistic used as a mea-sure of variation. When s is relatively large, say at
least 20, an alternative measure of variation is the
standard deviation of the efficiency scores. If the
DEA results are not sensitive to the number of
variables included in the model, the range statistic
and the standard deviation should be relatively
small. However, the cut-off point for small or large
4 Tauer and Hanchar (1995) introduce the dimensionality
test in the context of an output-oriented DEA model with CRS
technology.
414 A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422
variation is a matter of subjective judgement.Units receiving relatively large variation in effi-
ciency scores across model specifications require
further examination.
2.2.3. Analysis of appropriate efficient peers
This procedure tests whether the model selects
an appropriate set of best practice units for the
relative inefficient unit to imitate their manage-ment style. If DEA results are to be used in im-
proving managerial practice, the model must pass
this ‘‘reality’’ check. An appropriate efficient peer
is a best practice performer in the sample that is
similar to the inefficient unit in some respects (e.g.,
size, characteristics) and is able to lend insights
useful for assisting the inefficient unit to improve.
Both qualitative and quantitative information areimportant features to be included in the criteria.
However, the recipe for assessing appropriate ef-
ficient peers is unique for the sample and should be
determined in consultation with experts in the
particular area of research.
2.3. Dimensionality testing
Before appropriate policies can be designed and
implemented to assist inefficient units improve
their performance, policymakers need to know
whether inefficiencies exist across units in the or-
ganisation. Sensitivity tests alone provide insights
whether the DEA results are robust across model
specifications but do not provide the justification
whether these inefficiencies indeed exist. The ap-parent inefficiencies may be a result of dimensio-
nality. This paper develops a dimensionality test
to validate the existence of inefficiencies.
Tauer and Hanchar (1995) use a Monte-Carlo
simulation technique to investigate the dimensio-
nality effects on DEA efficiency scores, where di-
mensionality refers to number of variables (inputs
and outputs) and/or sample size. The authorsgenerate simulated data sets from a uniform dis-
tribution and conclude that the effect of increasing
the number of variables in the model is stronger
than the effect of reducing the number of obser-
vations. This paper focuses on the dimensionality
effect from varying numbers of model variables
for a fixed sample size.
We use a simulation method to perform the di-mensionality test. Our procedure begins with a
construction of k simulated samples; each sample
consists of J units and (N þM) series of numbers
that bear no technological relationship. Specifi-
cally, we draw a series of J random numbers from a
normal distribution. Some of these random num-
bers will be negative; others will be positive values.
Since the actual, observed data is non-negative, wetake the absolute values of these random numbers
and treat them as values of one random variable.
We replicate this procedure until we have N þMseries of numbers. These figures constitute one
simulated sample similar to the actual data set of Junits with N þM variables. Unlike the actual,
observed data set, the numbers in the simulated
sample are purely random. The efficiency scorescomputed from the simulated sample are the effect
of the dimensionality. We apply this mechanism ktimes to generate k simulated samples.
For each simulated sample, we calculate DEA
input-oriented efficiency scores under the assump-
tion of VRS technology 4 and calculate the number
of efficient units expressed as a percentage of the
total units in the sample. The mean percentage ofefficient units over k replications is used to test the
null hypothesis that the proportions of efficient
units from simulated data and the actual data are
the same, using a Z statistic. A failure to reject the
null hypothesis implies that the DEA computed
efficiency scores merely reflect the dimension of the
model. If, on the other hand, we find evidence to
reject the null hypothesis, it suggests that there areinefficiencies across units in the sample. DEA
computed efficiency scores capture the relative
performance of units in the data set and are not
driven by only the number of variables in the model.
3. An empirical demonstration: A case of 1995–1996
NSW police patrols
This section illustrates the empirical procedures
proposed in this paper, using a sample of NSW
5 Since DEA does not allow error in data measurement, it is
important that outliers are inspected before computing effi-
ciency scores. This paper computes all possible output–input
ratios and uses a boxplot method to identify potential outliers.
All suspected outliers were referred to the NSW Police Service
for assessment. The numbers in the sample reflected the
corrections of all reported errors.
A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422 415
police patrols. Several DEA model specificationsare developed from available data sets. Since the
data do not capture all aspects of police patrols�activities, conclusions must be interpreted strictly
within the context of the measured model vari-
ables.
Bearing the above limitations in mind, we begin
with an illustration of variables selected for the
various model specifications. Section 3.1 intro-duces the sample of NSW police patrols and range
of potential variables. Section 3.2 formulates pos-
sible sets of variables and presents alternative
DEA model specifications. The theoretical rela-
tionship of efficiency scores across models is also
summarised. Section 3.3 presents the demonstra-
tion results.
3.1. The sample
The sample comprised 161 local police patrol
districts that existed in NSW in 1995–1996. There
were 97 urban patrols covering the Sydney, New-
castle and Wollongong metropolitan areas and
64 regional patrols covering the rest of the State.
Input-oriented models were used to reflect anobjective of delivering services to the community
with the minimum amount of resources. The
NSW Police Service provided the data.
NSW police districts produce two broad types
of services: law enforcement and crime prevention.
In 1995–1996, the latter represented approximately
40% of police work. Numbers of incidents, char-
ges, summons and major car accidents were usedto capture law enforcement activities. Crime pre-
vention activities were measured by distance tra-
velled by police cars (in kilometres) and the
number of intelligence reports prepared by front-
line police officers. Patrol intelligence officers re-
viewed submitted intelligence reports on local
criminal activity by: (i) making an assessment of its
quality, (ii) assigning it a security classification,and (iii) determining its distribution.
Police districts combine labor and capital re-
sources to deliver services to the community. In
the study, labor comprised police and civilian
employees measured as annual-average, full-time
equivalent staff. These figures include staff on
leave, for example, sick leave, long-service leave or
secondments to other police units. Three possiblemeasures of the capital input were available:
number of police cars, number of personal com-
puters, and area of station accommodation (mea-
sured in square metres). Each of these measures
captured slightly different aspects of the capital
input. For a detailed explanation of these vari-
ables, except intelligence reports, see Carrington
et al. (1997).Appendix A displays descriptive statistics of the
entire sample. 5 Appendices B and C summarise
descriptive statistics for the urban and regional
subgroups. In general, number of intelligence re-
ports had the largest variation relative to its mean
among the variables. All patrols used both types of
labor but were more intensive in their use of police
officers than civilian employees. Capital inputs,especially number of police cars and number of
personal computers, were relatively uniform across
patrols. The regional patrols recorded, on average,
significantly higher kilometres travelled and num-
ber of police cars than the urban patrols, reflecting
their larger area of coverage. For a detailed dis-
cussion of the sample, see Hughes and Yaisawarng
(2000).
3.2. Police DEA model specifications
Following Hughes and Yaisawarng (2000), this
paper tests the dimensionality of four possible
DEA models summarised in Table 1.
All models comprise six outputs, two labor in-
puts and a different number of capital inputs. Thealternative measures of the capital input were tes-
ted for their sensitivity. Model 1 has the largest
dimensions since it includes all three measures of
capital. All units in the sample would appear to be
the most efficient compared to their respective ef-
ficiency scores from the remaining three models.
This is a reflection of dimensionality. Model 4, on
Table 1
Specification of police DEA models
Variables Model
1 2 3 4
Outputs
Law enforcement activities
Incidents (excl. major car
accidents) Charges Summons served Major car accidents
Crime prevention activities
Kilometres travelled Intelligence reports
Inputs
Police Civilians Cars Personal computers Area of station accommo-
dation
Note: �� indicates that the variable is included in the model.
Table 2
Summary of pure technical efficiency results (161 observations)
Model
1 2 3 4
Average 0.933 0.921 0.918 0.901
Minimum 0.606 0.606 0.523 0.518
No. of efficient
patrols
89 80 77 67
416 A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422
the other hand, includes only one measure of
capital. Efficiency scores from Model 4 would be
the lowest, compared to all other models. Models
2 and 3 each include two measures of capital in-
puts, and efficiency scores computed from these
two models should be somewhere between those
from Models 1 and 4. The relationship betweenefficiency scores across models is theoretically
predicted as follows:
TE1 P TE2 and TE3 P TE4: ð2ÞNote that the relationship between efficiency
scores from Models 2 and 3 cannot be determined
a priori since both specifications include the samenumber of model variables. All four models in-
clude a relatively large number of model variables
and the respective efficiency scores may reflect the
dimensionality.
3.3. Demonstration results
The results consist of three parts. Sections 3.3.1and 3.3.2 report the replicated results of pure
technical efficiency scores for all DEA model
specifications in Table 1 and of sensitivity tests
from Hughes and Yaisawarng (2000) with elabo-
ration. Section 3.3.3 presents the results of thesimulation study, which tests whether the DEA
efficiency scores indicate the existence of ineffi-
ciency across units or simply reflect the particular
dimension of the DEA model.
3.3.1. Pure productive efficiency results
Table 2 summarises pure technical efficiency
scores for all four models. These scores werecomputed relative to VRS technology, i.e., the
DEA formulation in (1) with differences in the
number of model variables as shown in Table 1.
The average efficiency scores range from 0.90 to
0.93 and fall within a 5% range between the
models. The number of units with non-radial
slacks was similar across specifications, represent-
ing between 10% and 35% of the sample. Themagnitude of these slacks was small. The efficiency
scores suggest the scope for potential improvement
in relation to the best practice performers in the
sample. Some patrols may have been able to im-
prove more than others. For example, one urban
patrol could have potentially reduced its inputs by
significant amounts (from 35% to 50%) if it had
operated at best practice. This patrol was rankedamong the least efficient patrols in all models.
3.3.2. Sensitivity tests of DEA results
We replicate the suite of sensitivity tests of
Hughes and Yaisawarng (2000) as discussed in
Section 2.2 and report the detailed analysis below.
These results are used in conjunction of the di-
mensionality tests in the next section to enhancethe credence of the DEA efficiency scores.
3.3.2.1. Correlation tests. The Spearman rank
correlation coefficients between each pair of effi-
ciency scores were extremely high, ranging from
A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422 417
0.81 to 0.96. These correlation coefficients were
statistically different from zero at the 1% level of
significance, suggesting that the results were posi-
tively related and stable across specifications. The
Friedman statistical test for the probability distri-
butions of efficiency ranking across four models
had a Chi-square statistic (v23) of 4.023 with a p-
value of 0.2590. There was no evidence to rejectthe null hypothesis that the distributions of effi-
ciency ranking were stable across all four models.
Our results were not sensitive to the model speci-
fications.
3.3.2.2. Comparisons of individual efficiency scores.
An examination of individual efficiency scores
across models revealed that our results were re-markably stable across all specifications. These
results were not very sensitive to different proxies
of capital input. As expected, moving from Model
1 to Model 4 resulted in less efficient patrols, partly
due to a decrease in the dimensionality of the DEA
model. However, the proportion of efficient-
by-default patrols to the total number of efficient
patrols in each specification remained stable in the20–25% range. Each efficient-by-default patrol has
a unique input and output combination, compared
to other patrols in the sample and is compared to
itself in evaluating the efficiency; it does not serve
as a peer for other inefficient patrols.
Detailed analysis of efficiency scores across the
four models revealed that 78 patrols received
identical efficiency scores (measured to three dec-imal places) in all models. Some of these patrols
were efficient; others were not. The remaining 83
patrols received different scores across the four
models. The majority of these 83 patrols had a
moderate range of variation and were distributed
across efficiency groups (e.g., high, low). Only a
small number of patrols experienced large varia-
tion in efficiency scores across models. The changein relative efficiency across models, perhaps, could
be attributable to the change in the input mix that
led to different efficient peer groups or the same
peers with different weights.
A closer look at these 83 patrols revealed that
46 patrols received the same efficiency scores in at
least two models. At least one of the three core
inputs––police officers, civilian employees or police
cars––effectively determined the patrol�s efficiency.(Of these three, the number of police officers was
the most important input.) Among the remaining
37 of the 83 patrols receiving different scores
across the four models, we found that the differ-
ence between their highest and lowest scores across
models ranged from 0.003 to 0.209. For example,
an urban patrol was efficient by default in Models
1 and 3. It was 83% efficient when the number ofpolice cars and the number of computers were
used as proxies for capital (Model 2). Its efficiency
score dropped to 79% when the number of police
cars was the only capital proxy (Model 4). Two
other patrols experienced the next highest varia-
tions in their scores for this group. Both were
inefficient in all four models. The urban patrol
appeared to be more efficient when the area ofstation accommodation was included in place of
the number of personal computers. These results
were broadly consistent with the general charac-
teristics of patrols––urban patrols had smaller
station areas and possibly higher numbers of
computer terminals than regional patrols.
3.3.2.3. Identifying appropriate efficient peers. Asdiscussed in Hughes and Yaisawarng (2000), the
NSW Police Service recommended using a geo-
graphic and/or demographic profile to assess the
appropriateness of peers. An inefficient urban pa-
trol had appropriate efficient peers if one of the
following criteria was satisfied: (1) all peers were
urban patrols, or (2) a regional peer carried a
weight of less than 10%. If an inefficient urbanpatrol had any regional peer with a weight greater
than or equal to 10%, geographic characteristics of
the area as well as its population size were con-
sidered. A regional peer with a population of the
main town centre in the regional patrol area less
than or equal to 2500 persons was an inappropri-
ate peer for an urban patrol. Similar rules were
used in evaluating peers for inefficient regionalpatrols. Any inefficient patrol that had at least one
inappropriate peer was classified as a ‘‘problem’’
case. Relevant mixes of outputs and inputs of
patrols in problem cases were compared.
Results were remarkably similar across models.
The majority of inefficient patrols had appropriate
peers. The number of problem cases ranged from 7
6 It is possible to modify the simulated sample to test
whether an additional variable is relevant to the model. For
example, one may wish to test whether the area of station
accommodation in Model 1 is an important variable. To do
this, a hybrid-simulated sample, which comprises actual data
for all variables except that the actual area of station accom-
modation is replaced by a random number to generate
simulated results. After a number of replications, one can test
whether the simulated results differ from the actual result. If the
results differ statistically, the area of station accommodation is
an important variable for inclusion in the DEA model.
418 A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422
(Models 1 and 2) to 11 (Model 4). Three cases werecommon for all four models. They involved two
inefficient urban patrols and one inefficient re-
gional patrol. Each of these inefficient urban (re-
gional) patrols had one regional (urban) peer with
a weight ranging from 12% to 29%. The geo-
graphic characteristics of these peers were very
different from those of the inefficient patrols. A
comparison of input and output compositionssuggested that inappropriate peers for the re-
maining problem cases had very different mixes
compared to inefficient patrols, especially in re-
spect to total incidents, major car accidents, police
officers and police cars.
On the basis of sensitivity results, Model 2 was
preferable. Since DEA results were relatively sta-
ble across the four models, it could be inferred thatadding more capital proxies did not result in a
more accurate measure of efficiency. Instead, it
inflated efficiency scores from an increase in di-
mensionality. Therefore, a model with less di-
mensionality was more appealing. Model 2 was
more superior to Model 4, which had the least
model variables among the four models considered
because Model 2 provided appropriate peers formost inefficient patrols and would be more useful
for management purposes. There were only seven
problem cases in Model 2, representing approxi-
mately 8.6% of inefficient patrols––the lowest
percentage among the four models.
3.3.3. Dimensionality test
From a practitioner�s standpoint, it is importantfor managers or policy makers to reward the
‘‘true’’ best performers and to assist inefficient
units in improving their performance. Since the
number of model variables in relation to sample
size may overstate the number of efficient units,
the effect of dimensions should be tested. In our
empirical demonstration of testing dimensionality
effects for all four models, we created 1000 simu-lated samples for each model using the procedures
described in Section 2.3. Each simulated data set
comprised 161 observations and included between
9 and 11 data series, so that the number of units
and model variables are consistent with the actual
sample of NSW police patrols. For example, a
simulated data set used to test the dimensionality
effect of Model 1 on DEA efficiency scores com-prised 161 observations and included 11 data series
(the number of variables included in Model 1). The
values for each data series were all independent
random numbers and hence there was no struc-
tural relationship among the figures assigned to
each observation.
For each DEA specification in Table 1, we
computed efficiency scores for all 161 simulatedobservations using (1). Specifically, we included 11
series in the calculation of DEA efficiency scores to
obtain simulated results for Model 1. Similarly, we
included all nine series in the calculation to obtain
simulated results for Model 4. Note that the sim-
ulated results for Models 2 and 3 should have been
identical since both models had ten variables. The
total number of efficient units from each modelrelative to each simulated sample was recorded as
a percentage of total observations in the sample. 6
This process was repeated for all 1000 simulated
samples. Table 3 summarises the results from the
actual data set and the simulated samples.
The proportions of efficient patrols from the
actual, observed sample were consistently lower
than the mean proportions from simulated sam-ples (except for Model 4), given the same number
of variables included in the model. The model that
included additional variables (i.e., more dimen-
sions) had a higher percentage of efficient units as
expected. We focused on the apparent differences
between the proportion of efficient units from the
actual data and simulated samples. The null hy-
pothesis was that, for each DEA specification, themean percentage of efficient units from the actual
sample should be the same as the average pro-
portion of efficient units from 1000 replications of
Table 3
Summary of efficiency results from actual sample and simulated
samples (161 observations)
Model
1 2 3 4
Number of variables 11 10 10 9
% of eff. units from
actual sample
55.3 49.7 47.8 41.6
% of eff. units from
simulated samplesa
58.7 50.4 50.4 41.2
Standard deviation 4.52 4.74 4.74 4.69
Z statisticb 23.79 4.67 17.35 )2.70
a The figures are the averages of the simulated percentage of
efficient observations for 1000 replications.b A Z statistic is computed to test the null hypothesis that the
average percentage of efficient observations from the actual
sample is less than that from the simulated samples. The critical
Z statistic for a one-tailed test at the 1% level of significance is
2.33.
A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422 419
simulated samples. The alternative hypothesis was
that the mean percentage of efficient units from the
actual sample should be less than the averageproportion of efficient units from the simulated
samples. A one-tailed hypothesis test was used.
The calculated Z statistics are displayed in the last
row of Table 3.
Since these Z statistics exceeded the critical Zvalue at the 1% level of significance for Models 1–
3, we rejected the null hypothesis for these models.
The DEA calculated efficiency scores from thesethree models reflected differences in the perfor-
mance of police patrols in the sample. That is,
inefficiencies exist across police patrols in the
sample and these inefficiencies are not a reflection
of the dimensionality of the first three models.
However, we could not reject the null hypothesis
for Model 4, indicating that the Model 4 results
reflected the dimensionality. Given the nine vari-ables included in Model 4, the percentage of effi-
cient units should have been less than 41.2% if the
number of model variables had not inflated the
DEA efficiency scores. Model 4 was not an ap-
propriate model.
On the basis of dimensionality testing, efficiency
scores obtained from Models 1–3 were not con-
taminated by the dimensionality of the model.These three models could be used in the perfor-
mance analysis of the police patrols in the sample.
Efficiency results from Model 4 were merely driven
by the dimensionality and should not be used.Model 4 would have been acceptable had policy
decision-makers made their judgement on the basis
of sensitivity tests alone. This finding suggests that
empirical researchers should perform both sensi-
tivity and dimensionality tests before reaching a
conclusion as to which model is more appropriate.
In our empirical demonstration, Model 2 is the best.
4. Conclusions
This paper extends a standard procedure for
performance measurement using the DEA tech-
nique. To enhance the managerial usefulness and
reliability of DEA results, the paper suggests that
sensitivity and dimensionality tests be performed.We present a suite of sensitivity tests for DEA
results and develop a procedure to test for model
dimensionality using simulated samples. Although
the sensitivity and dimensionality tests are illus-
trated in the context of the performance of the
NSW police patrols, these tests are applicable to
any DEA study. Both the sensitivity and dimen-
sionality tests are complementary.Our empirical illustration suggests that NSW
police patrols in 1995–1996, on average, were be-
tween 90% and 93% efficient compared to best
performers in the sample. The sensitivity tests
suggest that the results are broadly robust across
the various model specifications. The preferred
model comprising six outputs and four inputs
adequately captured the measured activities of thepolice patrols and provided appropriate efficient
peers to assist performance improvement by in-
efficient units. Further, the dimensionality test
revealed that the results were different from the
simulated results––except for Model 4. Hence,
DEA computed efficiency scores were not driven
by the number of model variables. Rather they
reflected variations in the actual performance ofpolice patrols in the sample, i.e., differences in
patrols� ability to transform model input variables
to deliver measured services. Our procedure vali-
dates the DEA results prior to their use for man-
agement purposes including resource allocation,
benchmarking, and performance-related bonuses
along the lines suggested in Yaisawarng (2002).
420 A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422
Acknowledgements
Much of this research project was completed
while S. Yaisawarng worked as a senior economist
at the NSW Treasury. The views expressed in this
paper are those of the authors and do not neces-
Mean Std. de
Reactive outputs
Incidents 3821.52 2451
Major car accidents 488.31 315Summons 28.34 28
Charges 643.91 433
Pro-active outputs
Km travelled by police 341505.54 163775Intelligence reports 359.58 562
Inputs
Police officers 57.63 29Civilian employees 6.52 5
Police cars 10.18 4
Personal computers 28.58 13
Station area 1321.35 1018
Mean Std. d
Reactive outputs
Incidents 4773.25 2494
Major car accidents 636.51 303Summons 24.05 29
Charges 658.11 467
Pro-active outputs
Km travelled by police 276641.60 129113Intelligence reports 437.27 697
Inputs
Police officers 67.09 30Civilian employees 8.03 6
Police cars 8.56 4
Personal computers 30.84 13
Station area 1134.66 985
sarily reflect those of the NSW Treasury, the NSWPolice Service or the NSW Government. The au-
thors thank Jim Baldwin of the NSW Police Ser-
vice for his help with interpreting the data and
Paul Stewart-Stand of Union College for his
research assistance.
Appendix A. Descriptive statistics for NSW police patrols, 1995–1996 (161 observations)
v. Minimum Maximum
.42 378 13497
.18 26 1760
.59 0 213
.28 106 2243
.49 39085 766874
.59 25 6554
.62 10.64 150.64
.66 0.37 37.77
.97 2 27
.46 5 105
.25 112 7030
Appendix B. Descriptive statistics for NSW urban police patrols, 1995–1996 (97 observations)
ev. Minimum Maximum
.53 1082 13497
.26 176 1760
.08 0 213
.46 106 2243
.90 90450 706183
.60 56 6554
.66 24.81 150.64
.44 1.00 37.77
.56 2 27
.90 8 105
.61 112 6918
A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422 421
Appendix C. Descriptive statistics for NSW regional police patrols, 1995–1996 (64 observations)
Mean Std. dev. Minimum Maximum
Reactive outputs
Incidents 2379.05 1507.19 378 5849
Major car accidents 263.70 164.54 26 592Summons 34.84 26.75 0 105
Charges 622.38 378.16 144 1961
Pro-active outputs
Km travelled by police 439814.90 162547.80 39085 766874Intelligence reports 241.84 197.22 25 1024
Inputs
Police officers 43.29 21.19 10.64 97.78Civilian employees 4.25 3.06 0.37 12.41
Police cars 12.64 4.57 4 27
Personal computers 25.17 12.08 5 67
Station area 1604.30 1008.93 358 7030
References
Afriat, S., 1972. Efficiency estimation of production functions.
International Economic Review 13, 563–598.
Banker, R.D., Charnes, A., Cooper, W.W., 1984. Some models
for estimating technical and scale efficiencies in data envel-
opment analysis. Management Sciences 30 (9), 1078–1092.
Carrington, R., Puthucheary, N., Rose, D., Yaisawarng, S.,
1997. Performance measurement in government service
provision: The case of police services in New South Wales.
Journal of Productivity Analysis 8 (4), 415–430.
Charnes, A., Cooper, W.W., Rhodes, E., 1978. Measuring
efficiency of decision making units. European Journal of
Operations Research 6, 429–444.
Cooper, W.W., Seiford, L.M., Tone, K., 2000. Data Envelop-
ment Analysis: A Comprehensive Text with Models,
Applications, References and DEA-Solver Software. Klu-
wer Academic Publishers, Boston.
F€aare, R., Grosskopf, S., Lovell, C.A.K., 1985. The Measure-
ment of Efficiency of Production. Kluwer-Nijhoff Publish-
ing, Boston.
F€aare, R., Grosskopf, S., Lovell, C.A.K., 1994. Production
Frontiers. Cambridge University Press, Cambridge.
Farrell, M.J., 1957. The measurement of productive efficiency.
Journal of the Royal Statistical Society Series A 120, 253–
281.
Ferrier, G., Grosskopf, S., Hayes, K., Yaisawarng, S., 1993.
Economies of diversification in the banking industry: A
frontier approach. Journal of Monetary Economics 31, 229–
249.
Grosskopf, S., 1986. The role of reference technology in
measuring productive efficiency. The Economic Journal
96, 499–513.
Grosskopf, S., 1996. Statistical inference and nonparametric
efficiency: A selective survey. Journal of Productivity
Analysis 7 (2–3), 161–176.
Grosskopf, S., Yaisawarng, S., 1990. Economies of scope in the
provision of local public services. National Tax Journal 43,
51–74.
Hughes, A., Yaisawarng, S., 2000. Efficiency of local
police districts: A New South Wales experience. In:
Blank, J.L.T. (Ed.), Public Provision and Perfor-
mance: Contributions from Efficiency and Productivity
Measurement. Elsevier Science B.V., Amsterdam, pp. 277–
296.
Lovell, C.A.K., 1993. Production frontiers and produc-
tive efficiency. In: Fried, H.O., Schmidt, S.S., Lovell,
C.A.K. (Eds.), The Measurement of Productive Effi-
ciency. Oxford University Press, New York, pp. 3–
67.
Nunamaker, T.R., 1985. Using data envelopment analysis to
measure the efficiency of non-profit organisations: A critical
evaluation. Managerial and Decision Economics 6 (1),
50–58.
Steering Committee for the Review of Commonwealth/State
Service Provision, 1997. Data Envelopment Analysis: A
Technique for Measuring the Efficiency of Government
Service Delivery. AGPS, Canberra.
Tauer, L.W., Hanchar, J.J., 1995. Nonparametric technical
efficiency with K firms, N inputs, and M outputs: A
simulation. Agricultural and Resource Economics Review
24 (2), 185–189.
Thrall, R.M., 1989. Classification transitions under expan-
sion of inputs and outputs in data envelopment
analysis. Managerial and Decision Economics 10,
159–162.
422 A. Hughes, S. Yaisawarng / European Journal of Operational Research 154 (2004) 410–422
Valdmanis, V., 1992. Sensitivity analysis for DEA models: An
empirical example using public vs. NFP hospitals. Journal
of Public Economics 48, 185–205.
Yaisawarng, S., 2002. Performance measurement and resource
allocation. In: Fox, K. (Ed.), Efficiency in the Public Sector.
Kluwer Academic Publishers, Boston, pp. 61–81.