Introduction to Robust and Clustered Standard Errors
Miguel Sarzosa
Department of Economics
University of Maryland
Econ626: Empirical Microeconomics, 2012
1 Standard Errors, why should you worry about them
2 Obtaining the Correct SE
3 Consequences
4 Now we go to Stata!
Regressions and what we estimateA regression does not calculate the value of a relation between twovariables. In fact, it calculates a distribution of values of suchrelation. That is why, when you calculate a regression the two mostimportant outputs you get are:
I The conditional mean of the coe�cientI The standard deviation of the distribution of that coe�cient
Are we in the presence of a star?
The standard errors determine how accurate is your estimation.Therefore, it a�ects the hypothesis testing.That is why the standard errors are so important: they are crucial indetermining how many stars your table gets.And like in any business, in economics, the stars matter a lot.Hence, obtaining the correct SE, is critical
SE and the data
The correct SE estimation procedure is given by the underlying structureof the data.
It is very unlikely that all observations in a data set are unrelated, butdrawn from identical distributionsFor instance, the variance of income is often grater in familiesbelonging to top deciles than among poorer familiesSome phenomena do not a�ect observations individually, but theya�ect groups of observations uniformly within each group.
SE and the data
The correct SE estimation procedure is given by the underlying structureof the data.
It is very unlikely that all observations in a data set are unrelated, butdrawn from identical distributionsFor instance, the variance of income is often grater in familiesbelonging to top deciles than among poorer families(heteroskedasticity)
Some phenomena do not a�ect observations individually, but theya�ect groups of observations uniformly within each group. (clustered
data)
The Basic Case (i.i.d.)
y
i
= x
0i
b + ei
ei
⇠�0,s2
�
This means that E
hee 0 |X
i= ⌦= s2
I
We know b =hX
0X
i�1
X
0y = b +
hX
0X
i�1
X
0e and
Var (b ) = E
hX
0X
i�1
X
0ee 0X
hX
0X
i�1
�=
hX
0X
i�1
E
hX
0ee 0X
ihX
0X
i�1
= s2
hX
0X
i�1
Heteroskedasticity (i.n.i.d)
y
i
= x
0i
b + ei
ei
⇠�0,s2
i
�
This means that E
hee 0 |X
i= ⌦= diag
�s2
i
�(big di�erence)
Heteroskedasticity (i.n.i.d)
Now
Var (b )=E
hX
0X
i�1
X
0ee 0X
hX
0X
i�1
�=hX
0X
i�1
E
hX
0ee 0X
ihX
0X
i�1
No further simplification is possibleNeed to estimate E
hX
0ee 0X
i= ÂN
i=1
e2
i
x
i
x
0i
Be aware that ÂN
i=1
e2
i
x
i
x
0i
6= X
0^e^eX
Then the Huber-Eicker-White (HEW) VC estimator is:
Var
⇣b⌘=hX
0X
i�1
"N
Âi=1
e2
i
x
i
x
0i
#hX
0X
i�1
(1)
Two-step procedure: 1. Get ei
, 2. Estimate (1). In Stata:vce(robust)
Clustered Data
Observations are related with each other within certain groups
ExampleYou want to regress kids’ grades on class size.The unobservables of kids belonging to the same classroom will becorrelated (e.g., teachers’ quality, recess routines) while will not becorrelated with kids in far away classrooms.
This means, we are assuming independence across clusters butcorrelation within clusters. That is:
y
ig
= x
0ig
b + eig
where g = 1, . . . ,G
E
he
ig
ejg
0
i=
(0 if g = g
0
s(ij)g if g 6= g
0 (2)
Clustered Data
Let us stack observations by cluster
y
g
= x
0g
b + eg
where g = 1, . . . ,G
The OLS estimator of b is still b =hX
0X
i�1
X
0y. (We just stacked
the data)The variance is given by
Var (b ) = E
hX
0X
i�1
X
0⌦X
hX
0X
i�1
�
Clustered Data
By (2), we know that
⌦= diag (⌃g
)
=
2
666666666666666666666664
s2
(11)1 . . . s(1N
1
)1 0 . . . 0 . . . 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
s2
(N1
1)1 s(N1
N
1
)1 0 0 0 0
0 . . . 0 s2
(11)2 . . . s(1N
2
)2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 s(N2
1)2 . . . s2
(N2
N
2
)2
.
.
.
.
.
.
.
.
.
s2
(11)g . . . s(1N
g
)g
.
.
.
.
.
.
.
.
.
s(Ng
1)g . . . s2
(Ng
N
g
)g
3
777777777777777777777775
Clustered Data
With this in mind, we can now write the VC matrix for clustered data
Var
⇣b⌘=
hX
0X
i�1
"G
Âi=1
x
0g
eg
e 0g
x
g
#hX
0X
i�1
In Stata: vce(cluster clustvar ). Where clustvar is a variablethat identifies the groups in which onobservables are allowed tocorrelate.
The importance of knowing your data
In real world you should never go with the i.i.d. case. Life is not thatsimpleYou need to know your data in order to choose the correct errorstructure and then infer the required SE calculationIf you have aggregate variables (like class size), clustering at that levelis required.You might think your data correlates in more than one way
I If nested (e.g., classroom and school district), you should cluster at thehighest level of aggregation
I If not nested (e.g., time and space), you can:1
Include fixed-e�ects in one dimension and cluster in the other one.
2
Multi-way clustering extension (see Cameron, Gelbach and Miller, 2006)
How much will my SE increase?
Clustered SE will increase your confidence intervals because you areallowing for correlation between observations. Hence, less stars in yourtables. The higher the clustering level, the larger the resulting SE.For one regressor the clustered SE inflate the default (i.i.d.) SE by
q1+r
x
re�N �1
�
were rx
is the within-cluster correlation of the regressor, re is thewithin-cluster error correlation and N is the average cluster size.Here it is easy to see the importance of clustering when you haveaggregate regressors (i.e., r
x
= 1).
Now we go to Stata!
Top Related