More efficient CR2 cluster-robust standard errors
I had written before about using the CR2 standard error variant and how they can be used to account for clustering when using basic OLS (or GLM) regression. Some articles on the topic:
- Using cluster-robust standard errors when analyzing group-randomized trials with few clusters (Behavior Research Methods)
- Using robust standard errors for the analysis of binary outcomes with a small number of clusters (Journal of Research on Educational Effectiveness)
The CR2 can be effective even with only a few clusters (which has been a problem for other varieties such as the CR1 or CR0) when used with Satterthwaite degrees of freedom adjustments. The CR2 was originally proposed by Bell and McCaffrey (2002) over two decades ago who referred to it as the bias reduced linearization (BRL) method. Several researchers (Imbens & Kolesar; Pustejovsky & Tipton; Huang & Li) have recommended its more routine computation compared to other CRSE variants (at least compared to the CR0 or CR1). Several software packages compute this – SPSS can compute this using the free add-on here.
The basic standard error is the square of the diagonal of the following matrix:
.
Using the Liang and Zeger (1986) formulation for cluster robust standard errors:
.
refers to the total number of clusters. Here: . See one of the manuscripts for explanations of the matrices. refers to the hat matrix or and is an identity matrix.
I had mentioned that computing the adjustment matrix requires computing the inverse of the symmetric square root which is done per cluster and can be slow (and requires storing these matrices which can be large). However, Niccodemi et al. (2020) showed a more efficient way (in Appendix A) to compute the CR2 which required the inversion of a smaller k k matrix instead (where k is the number of predictors).
An example using CR2
Using the High School and Beyond dataset, we fit a basic model and compute the CR2 using the formulation presented. In the dataset, there are 160 schools (clusters).
library(mlmRev)
data(Hsb82)
h1 <- lm(mAch ~ meanses + sector + sx + cses + cses * sector + minrty,
data = Hsb82)
#inverse square root of a matrix function
MatSqrtInverse <- function(A){
ei <- eigen(A, symmetric = TRUE)
d <- pmax(ei$values, 10^-12)
d2 <- 1/sqrt(d)
d2[d == 0] <- 0
ei$vectors %*% (if (length(d2) == 1) d2
else diag(d2)) %*% t(ei$vectors)
}
#### STANDARD WAY:: basic computation
CR2ex <- function(model, data, cluster){
X <- model.matrix(model) #design matrix
k <- ncol(X) #number of predictors
br <- solve(t(X) %*% X) #break matrix
# br <- chol2inv(qr.R(qr(model))) #faster
Hm <- X %*% br %*% t(X) #hat matrix
N2 <- length(table(as.character(data[, cluster]))) #number of clusters
cnames <- names(table(data[, cluster])) #cluster ids
u <- resid(model) #residuals
I <- diag(nobs(model)) #identity matrix
mt <- list() #container for meat matrix
for (i in 1:N2){
ind <- which(data[, cluster] == cnames[i])
Xg <- X[ind, ]
Ig <- I[ind, ind]
Hg <- Hm[ind, ind]
A <- MatSqrtInverse(Ig - Hg)
uu <- u[ind]
mt[[i]] <- t(Xg) %*% A %*% uu %*% t(uu) %*% A %*% Xg
}
mt2 <- Reduce("+", mt) #add up all elements of mt
sqrt(diag(br %*% mt2 %*% br)) #the CR2
}
CR2ex(h1, Hsb82, 'school')
(Intercept) meanses sectorCatholic
0.2036939 0.3517720 0.2759393
sxFemale cses minrtyYes
0.2007091 0.1561396 0.2668150
sectorCatholic:cses
0.2281685
Using Niccodemi et al.’s method
See Niccodemi et al.’s (2020) Appendix for the explanation of the matrices.
sqrtm <- function(z){
eig <- eigen(z)
Q <- eig$vectors
Lambda <- diag(sqrt(eig$values))
res <- Q %*% Lambda %*% t(Q)
}
CR2eff <- function(model, data, cluster){
X <- model.matrix(model)
k <- ncol(X)
Ik <- diag(k)
br <- solve(t(X) %*% X)
invX <- MatSqrtInverse(t(X) %*% X)
sqrtX <- sqrtm(t(X) %*% X)
N2 <- length(table(as.character(data[, cluster])))
cnames <- names(table(data[, cluster]))
u <- resid(model)
I <- diag(nobs(model))
mt <- list()
for (i in 1:N2){
ind <- which(data[, cluster] == cnames[i])
Ri <- X[ind, ] %*% invX
s_hat <- t(X[ind, ]) %*% u[ind]
si <- sqrtX %*% MatSqrtInverse(Ik - t(Ri) %*% Ri) %*%
invX %*% s_hat
mt[[i]] <- si %*% t(si)
}
mt2 <- Reduce('+', mt)
sqrt(diag(br %*% mt2 %*% br))
}
CR2eff(h1, Hsb82, 'school')
(Intercept) meanses sectorCatholic
0.2036939 0.3517720 0.2759393
sxFemale cses minrtyYes
0.2007091 0.1561396 0.2668150
sectorCatholic:cses
0.2281685
Comparing the output
Of course, we can just use optimized functions to compute this using the
clubSandwich and the sandwich packages. Four methods are shown to
compute the CR2 and all have the same result.
library(clubSandwich)
library(sandwich)
data.frame(
CS = sqrt(diag(vcovCR(h1, cluster = Hsb82$school, type = 'CR2'))),
SA = sqrt(diag(vcovCL(h1, cluster = Hsb82$school, type = 'HC2'))),
BMCR2 = CR2ex(h1, Hsb82, 'school'),
NCR2 = CR2eff(h1, Hsb82, 'school')
)
CS SA BMCR2 NCR2
(Intercept) 0.2036939 0.2036939 0.2036939 0.2036939
meanses 0.3517720 0.3517720 0.3517720 0.3517720
sectorCatholic 0.2759393 0.2759393 0.2759393 0.2759393
sxFemale 0.2007091 0.2007091 0.2007091 0.2007091
cses 0.1561396 0.1561396 0.1561396 0.1561396
minrtyYes 0.2668150 0.2668150 0.2668150 0.2668150
sectorCatholic:cses 0.2281685 0.2281685 0.2281685 0.2281685
- CS: computed using
clubSandwich - SA: computed using
sandwich - BMCR2: using the manual Bell and McCaffrey formulation (using the function above)
- NCR2: using the manual Niccodemi et al. formulation (using the function above)
Looking at efficiency (a speedtest)
I use the microbenchmark package to compare the speed of the
functions. I just compute the CR2 100 times using each method.
library(microbenchmark)
microbenchmark(
NCR2 = CR2eff(h1, Hsb82, 'school'),
clubSandwich = vcovCR(h1, type = 'CR2', cluster = Hsb82$school),
sandwich = vcovCL(h1, type = 'HC2', cluster = Hsb82$school),
BMCR2 = CR2ex(h1, Hsb82, 'school'),
times = 100
)
Unit: milliseconds
expr min lq mean median uq max
NCR2 56.6179 81.59730 191.6820 267.0478 286.9800 325.2789
clubSandwich 83.5485 88.18125 100.0753 93.4289 104.1042 351.5703
sandwich 105.7582 114.77775 127.4125 122.2824 135.0439 181.9845
BMCR2 285.6013 491.16720 500.0858 506.2332 518.9025 578.4627
neval cld
100 a
100 b
100 c
100 d
What is clear here (looking at mean times) is that the manual
computation (BMCR2) is the slowest (i.e., took the most time to
complete). Of course, the manual code is not really optimized either but
is written in a way to make the computation more transparent. In this
case, the more efficient version (NCR2) was faster.
– END
Related
- Cluster-robust standard errors with three-level data
- Investigating the use of robust standard errors to account for two-way clustering in cross-classified data structures
- Using cluster-robust standard errors when analyzing group-randomized trials with few clusters
- Using robust standard errors for the analysis of binary outcomes with a small number of clusters
- Using cluster robust standard errors to analyze nested data with a few clusters (Korea)