Using Regression Residuals as Performance Measures:Pitfalls and Possibilities
Martin E. Sandbu
Center on Globalization
and Sustainable Development
Columbia University
Two types of comparison
• “Descriptive” comparison of achievementsalong some dimension– “League tables” of GDP, GDP growth, etc.
• “Normative” comparison of performance along some dimension, controlling for inputs and external circumstances– Schools, hospitals
Regression residuals as performance measures
• Assume a causal relationship:Outcomei = a + b’Xi + ei
• Statistical fittingPredicted outcomei = α + β’Xi
Actual outcomei = α + β’Xi + εi
• Use difference as performance measure:Performancei = εi
Normalised performancei = εi/[α + β’Xi]
A simple way of constructing performance measures
Absolute versus relative performance
• “Relative performance”: Compare countries on how they perform relative to predicted achievement– WHO 1999 study of country performance
• “Absolute performance”: Compare countries on their position between zero and maximum possible efficiency– WHO’s index of “health system performance”
Conditioning on what?• Will depend on the purpose of the
comparison and the unit of analysis whose performance is being evaluated– Inputs: Condition on the resources “available”
to the unit– External circumstances: Condition on the
factors which “should” not be attributed to the unit’s performance
• Explaining performance: We may identify the controllable causes, but must include them in the performance measure
A generic model
• Suppose we estimate the following model:
Xi = α + β’INPi + γ’EXTi + δ’INTi + εi
where
X is the outcome of interest (e.g. health)
INP are inputs (e.g. education, income)
EXT are external factors (e.g. geography)
INT are internal factors (e.g. number of doctors)
• δ’INTi + εi measures how well i performs relative to how it “should”
Examples
• Hospital performance
• School performance– Chicago public schools, MBA programs
• Health performance of countries and country health systems
• Economic performance: Total Factor Productivity Growth
Pitfalls: The case of WHR 2000• World Health Report 1999 had investigated
country performance in health• World Health Report 2000 set out to
measure “health system performance”• Absolute performance concept: Produced an
index from 0 to 1 where 0 equivalent to no health system, 1 the best possible system
• Used a residual from regression of disability-adjusted life expectancy on education and health expenditure per capita
WHR 2000 methodology
WHR 2000 methodology
• Efficiency index defined by:
[ODi + (PHOi – LBi)]/[ODmax + (PHOi –LBi)]
or
[HOi – LBi]/[ODmax + (PHOi – LBi)]
• Note an equivalent relative performance index would be:
ODi/PHOi
Problems with WHR 2000 index• Using absolute performance requires more
guesswork and is unnecessarily obscure when the goal is cross-comparison
• Inadequate partitioning of variables:– Controls for inputs like education and health
spending, but not for circumstances external to the health system, like economic policy
• Jamison and Sandbu (2001) test the robustness of the WHR 2000 ranking:– Repeat exercise but control for geography
Including geography controls
Including geography controls• When two geography variables are included
(tropical location and access to sea), ranks change dramatically
• Only 17 out of 96 countries remain within “uncertainty interval”
• The amended ranking not necessarily better: Still many conditioning variables left out
Three measures of performance• Note that achievements can be compared in
three ways:– Achievement at point in time
– Growth rate of achievement levels
– Inputs to the achievement of outcomes
• Similar distinction for performance:– Actual relative to predicted outcome
– Actual relative to predicted change over time
– Actual relative to predicted effect of inputs
Country performance in healthat a point in time and over time
Jamison, Sandbu and Wang’s (2004) model of infant mortality rates (IMR), estimated for 94 countries over 25-year period:
LIMRit = β0i + β1i TIMEt + β2 LY5it + β3 FEDUCit + εit
LIMR = 1n(IMR)
LY5 = 1n (per capita income)
FEDUC = Female education level
(all in country i at time t, 5-year intervals)
Residuals• Model of the country-specific coefficients:
– Intercept: β0i = γ00 + γ01TROPICSi + γ02COASTALi + µ0i
– Time trend:β1i = γ10 + γ11TROPICSi + γ12COASTALi + µ1i
(where µ0i and µ1i are normally distributed with mean zero)
– Total residual:
µ0i + TIMEt* µ1i
Performance
• How much lower than predicted was IMR in 1962? Beginning-of-period performance:
BPi ≡ 1 – exp(µ0i)
• How much faster than predicted did IMR fall? Within-period performance:
WPi ≡ 1 – exp(25*µ1i)
• We can combine the measures.End-of-period performance:
WPi ≡ 1 – exp(µ0i + 25*µ1i)
End-of-period performance:Low-/middle-income countries
0
50
100
150
200
250
Costa R
ica
Sudan, Z
imbab
weVen
ezuela
Sri Lan
ka, M
alays
iaBulg
aria
Kenya
Bolivia
Sierra
Leo
ne
Lesoth
o, Guin
ea B
issau
Indones
ia, T
urkey
The Gam
biaBan
glades
h
Country
% o
f Pre
dic
ted
IMR
End-of-period performance:High-income countries
0
50
100
150
Singap
ore
Hong K
ong
Finlan
dSpa
inSwed
en
Canad
aUK
Israe
lUS
New Z
ealan
d
Country
% o
f Pre
dict
ed IM
R
Input efficiency performance• Work by Or, Wang and Jamison estimates
the following model for OECD countries:
HOit = β0i + β1iDOCit + γ’Xit + εitwith
HO = various health outcomesDOC = number of doctors per capitaX = GDP/capita, education, tobacco, alcohol,
private/public financing mix
• Country-specific health productivity:
β1i = β1 + µ4iand µ4i normally distributed with mean zero
Findings for IMRInfant Mortality
a . Lines represent the 95 percent confidence intervals for country estimates.
Rank coefficients are multiplied by (-1) for visual convenience.
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Conclusions• Regression residuals can be used as
performance measures for normative comparison
• Caution required in identifying appropriate conditioning variables– Partial out inputs and external factors
– Don’t partial out factors the institution controls
• Methods allow rich performance analysis:– Decompose performance into various types
– Decompose performance into various causes
Top Related