More efficient CR2 cluster-robust standard errors

Oct 3, 2025 · 5 min read

I had written before about using the CR2 standard error variant and how they can be used to account for clustering when using basic OLS (or GLM) regression. Some articles on the topic:

Using cluster-robust standard errors when analyzing group-randomized trials with few clusters (Behavior Research Methods)
Using robust standard errors for the analysis of binary outcomes with a small number of clusters (Journal of Research on Educational Effectiveness)

The CR2 can be effective even with only a few clusters (which has been a problem for other varieties such as the CR1 or CR0) when used with Satterthwaite degrees of freedom adjustments. The CR2 was originally proposed by Bell and McCaffrey (2002) over two decades ago who referred to it as the bias reduced linearization (BRL) method. Several researchers (Imbens & Kolesar; Pustejovsky & Tipton; Huang & Li) have recommended its more routine computation compared to other CRSE variants (at least compared to the CR0 or CR1). Several software packages compute this – SPSS can compute this using the free add-on here.

The basic standard error is the square of the diagonal of the following matrix:

$Var(\hat{\beta}) = (X'X)^{-1}X'\hat{\Omega} X (X'X)^{-1}$ .

Using the Liang and Zeger (1986) formulation for cluster robust standard errors:

$Var(\hat{\beta}) = (X'X)^{-1}\sum_{g=1}^{G}X_g '\widehat{\Omega}_g X_g (X'X)^{-1}$ .

$G$ refers to the total number of clusters. Here: $\widehat{\Omega}_g=\left[ I_{N_g} - H_{gg} \right]^{-\frac{1}{2}} e_g e'_g \left[ I_{N_g} - H_{gg} \right]^{-\frac{1}{2}}$ . See one of the manuscripts for explanations of the matrices. $H$ refers to the hat matrix or $H = X(X'X)^{-1}X'$ and $I$ is an identity matrix.

I had mentioned that computing the adjustment matrix requires computing the inverse of the symmetric square root which is done per cluster and can be slow (and requires storing these matrices which can be large). However, Niccodemi et al. (2020) showed a more efficient way (in Appendix A) to compute the CR2 which required the inversion of a smaller k $\times$ k matrix instead (where k is the number of predictors).

An example using CR2

Using the High School and Beyond dataset, we fit a basic model and compute the CR2 using the formulation presented. In the dataset, there are 160 schools (clusters).

library(mlmRev)
data(Hsb82)
h1 <- lm(mAch ~ meanses + sector + sx + cses + cses * sector + minrty,
         data = Hsb82)

#inverse square root of a matrix function
MatSqrtInverse <- function(A){
  ei <- eigen(A, symmetric = TRUE)
  d <- pmax(ei$values, 10^-12)
  d2 <- 1/sqrt(d)
  d2[d == 0] <- 0
  ei$vectors %*% (if (length(d2) == 1) d2
  else diag(d2)) %*% t(ei$vectors)
}

#### STANDARD WAY:: basic computation

CR2ex <- function(model, data, cluster){
  
  X <- model.matrix(model) #design matrix
  k <- ncol(X) #number of predictors
  br <- solve(t(X) %*% X) #break matrix
  # br <- chol2inv(qr.R(qr(model))) #faster
  Hm <- X %*% br %*% t(X) #hat matrix
  
  N2 <- length(table(as.character(data[, cluster]))) #number of clusters
  cnames <- names(table(data[, cluster])) #cluster ids
  u <- resid(model) #residuals
  I <- diag(nobs(model)) #identity matrix
  
  mt <- list() #container for meat matrix
  
  for (i in 1:N2){
    ind <- which(data[, cluster] == cnames[i])
    Xg <- X[ind, ]
    Ig <- I[ind, ind]
    Hg <- Hm[ind, ind]
    A <- MatSqrtInverse(Ig - Hg)
    uu <- u[ind]
    mt[[i]] <- t(Xg) %*% A %*% uu %*% t(uu) %*% A %*% Xg
  }
  
  mt2 <- Reduce("+", mt) #add up all elements of mt
  sqrt(diag(br %*% mt2 %*% br)) #the CR2

}

CR2ex(h1, Hsb82, 'school')

        (Intercept)             meanses      sectorCatholic 
          0.2036939           0.3517720           0.2759393 
           sxFemale                cses           minrtyYes 
          0.2007091           0.1561396           0.2668150 
sectorCatholic:cses 
          0.2281685

Using Niccodemi et al.’s method

See Niccodemi et al.’s (2020) Appendix for the explanation of the matrices.

sqrtm <- function(z){
  eig <- eigen(z)  
  Q <- eig$vectors
  Lambda <- diag(sqrt(eig$values))
  res <- Q %*% Lambda %*% t(Q) 
}

CR2eff <- function(model, data, cluster){
  
  X <- model.matrix(model)
  k <- ncol(X)
  Ik <- diag(k)
  
  br <- solve(t(X) %*% X)
  invX <- MatSqrtInverse(t(X) %*% X)
  sqrtX <- sqrtm(t(X) %*% X)
  
  N2 <- length(table(as.character(data[, cluster])))
  cnames <- names(table(data[, cluster]))
  u <- resid(model)
  I <- diag(nobs(model))
  
  mt <- list()
  for (i in 1:N2){
    ind <- which(data[, cluster] == cnames[i])
    Ri <- X[ind, ] %*% invX
    s_hat <- t(X[ind, ]) %*% u[ind]
    si <- sqrtX %*% MatSqrtInverse(Ik - t(Ri) %*% Ri) %*% 
      invX %*% s_hat
    mt[[i]] <- si %*% t(si)
  }
  
  mt2 <- Reduce('+', mt)
  sqrt(diag(br %*% mt2 %*% br))
}

CR2eff(h1, Hsb82, 'school')

        (Intercept)             meanses      sectorCatholic 
          0.2036939           0.3517720           0.2759393 
           sxFemale                cses           minrtyYes 
          0.2007091           0.1561396           0.2668150 
sectorCatholic:cses 
          0.2281685

Comparing the output

Of course, we can just use optimized functions to compute this using the clubSandwich and the sandwich packages. Four methods are shown to compute the CR2 and all have the same result.

library(clubSandwich)
library(sandwich)

data.frame(
  CS = sqrt(diag(vcovCR(h1, cluster = Hsb82$school, type = 'CR2'))),
  SA = sqrt(diag(vcovCL(h1, cluster = Hsb82$school, type = 'HC2'))),
  BMCR2 = CR2ex(h1, Hsb82, 'school'),
  NCR2 = CR2eff(h1, Hsb82, 'school')
)

                           CS        SA     BMCR2      NCR2
(Intercept)         0.2036939 0.2036939 0.2036939 0.2036939
meanses             0.3517720 0.3517720 0.3517720 0.3517720
sectorCatholic      0.2759393 0.2759393 0.2759393 0.2759393
sxFemale            0.2007091 0.2007091 0.2007091 0.2007091
cses                0.1561396 0.1561396 0.1561396 0.1561396
minrtyYes           0.2668150 0.2668150 0.2668150 0.2668150
sectorCatholic:cses 0.2281685 0.2281685 0.2281685 0.2281685

CS: computed using clubSandwich
SA: computed using sandwich
BMCR2: using the manual Bell and McCaffrey formulation (using the function above)
NCR2: using the manual Niccodemi et al. formulation (using the function above)

Looking at efficiency (a speedtest)

I use the microbenchmark package to compare the speed of the functions. I just compute the CR2 100 times using each method.

library(microbenchmark)

microbenchmark(
  NCR2 = CR2eff(h1, Hsb82, 'school'),
  clubSandwich = vcovCR(h1, type = 'CR2', cluster = Hsb82$school),
  sandwich = vcovCL(h1, type = 'HC2', cluster = Hsb82$school),
  BMCR2 = CR2ex(h1, Hsb82, 'school'),
  times = 100
)

Unit: milliseconds
         expr      min        lq     mean   median       uq      max
         NCR2  56.6179  81.59730 191.6820 267.0478 286.9800 325.2789
 clubSandwich  83.5485  88.18125 100.0753  93.4289 104.1042 351.5703
     sandwich 105.7582 114.77775 127.4125 122.2824 135.0439 181.9845
        BMCR2 285.6013 491.16720 500.0858 506.2332 518.9025 578.4627
 neval  cld
   100 a   
   100  b  
   100   c 
   100    d

What is clear here (looking at mean times) is that the manual computation (BMCR2) is the slowest (i.e., took the most time to complete). Of course, the manual code is not really optimized either but is written in a way to make the computation more transparent. In this case, the more efficient version (NCR2) was faster.

– END

Last updated on Oct 9, 2025

More efficient CR2 cluster-robust standard errors

An example using CR2

Using Niccodemi et al.’s method

Comparing the output

Looking at efficiency (a speedtest)

Related