Accounting for random slopes using cluster robust standard errors
Including random slopes (when warranted) in multilevel models is necessary to avoid making Type I errors as a result of underestimated standard errors. We typically investigate the presence of random slopes (RS) by using a likelihood ratio test and comparing two competing models. Often, random slopes are included when we have cross level interactions. If a random intercept (RI) model is used when a random slope model is warranted, the regression coefficients will be the same though the standard errors will be incorrect.
We wrote an article in the Journal of Experimental Education that shows that you can use a simpler RI but together with cluster robust standard errors (CRSEs) to approximate the (fixed effect) results from an RS model. The use of CRSEs will adjust the standard errors appropriately and can also serve as a diagnostic if comparing model-based standard errors vs. CRSEs.
The abstract of the article reads:
Although random intercept (RI) multilevel models (MLMs) are commonly used, the inclusion of random slopes, when warranted, is necessary to avoid Type I errors for variables that randomly vary by group. However, instead of explicitly modeling the random slope, an alternative approach could be to use a more parsimonious RI model together with cluster-robust standard errors (CRSEs). Although the traditionally used CRSEs (CR0) can still underestimate standard errors when only a few clusters are present, we investigate a variant (i.e., the CR2) that has been shown to beeffective with a limited number of clusters. Results of a Monte Carlo simulation show that using a RI model together with the CR2 can effectively account for violations of homoscedasticity, resulting in acceptable coverage probability rates for all conditions tested. However, when used with a limited number of clusters, a properly specified random slope model has more power to detect effects for level-2 predictors and cross-level interaction (CLI) terms.
An implication of this is that RI models are easier (and faster) to fit.
Another implication is that the often-used SEM software lavaan
cannot (at the time of
writing this) fit random slope models but can use traditional sandwich
standard errors (by using the estimator = 'MLR' option). However, if
an RI model is fit together with CRSEs, results should be similar to
that of a RS model.
We show an example of this using a small simulation using 1,000
replications. We just have one level-2 predictor, one level-1 predictor,
and a cross-level interaction. The data generating process has a random
slope. NOTE: when modeling this using lavaan, the interaction variable
has to be created separately and added as a predictor.
Simulation
We use a small simulation of 50 clusters with 10 individuals per cluster.
library(lmerTest)
library(lavaan)
n2 <- 50 #no of level 2 units
n1 <- 10 #no of units at level 1 per cluster
tot <- n2 * n1
reps <- 1000 #replications
## model to be estimated using lavaan
ms <- '
level: 1
y ~ x1 + int
level: 2
y ~ w1
'
## creating containers
cfs1 <- cfs2 <- cfs3 <- cfs4 <-
ses1 <- ses2 <- ses3 <- ses4 <- matrix(NA, ncol = 4, nrow = reps)
icc <- numeric()
## begin simulation
set.seed(20260106) #for replicability
for (i in 1:reps){
# generate the data
w1 <- rep(rnorm(n2), each = n1)
e2 <- rep(rnorm(n2), each = n1)
x1 <- rnorm(tot)
e1 <- rnorm(tot)
e2s <- rep(rnorm(n2, 0 , sqrt(.2)), each = n1) #random slope
id <- rep(1:n2, each = n1)
y <- w1 * .25 + x1 * .5 + .25 * x1 * w1 + e2 + e1 + e2s * x1
int <- x1 * w1 #cross level interaction for lavaan
dat <- data.frame(id, w1, x1, y, int)
m0 <- lmer(y ~ 1 + (1|id), data = dat) #null model
m1 <- lmer(y ~ w1 + x1 + w1 * x1 + (1|id), data = dat) #ri
m2 <- lmer(y ~ w1 + x1 + w1 * x1 + (x1|id), data = dat) #rs
# #using lavaan
l1 <- sem(model = ms, data = dat, cluster = 'id') #same as RI
l2 <- sem(model = ms, data = dat, cluster = 'id', estimator = 'MLR')
## saving the output
res1 <- summary(m1)$coef
res2 <- summary(m2)$coef
icc[i] <- as.numeric(performance::icc(m0)[1])
cfs3[i, ] <- parameterestimates(l2)[c(13, 10, 1, 2), 6]
ses3[i, ] <- parameterestimates(l2)[c(13, 10, 1, 2), 7]
cfs1[i, ] <- res1[ ,1]
cfs2[i, ] <- res2[ ,1]
ses1[i, ] <- res1[ ,2]
ses2[i, ] <- res2[ ,2]
cfs4[i, ] <- parameterestimates(l1)[c(13, 10, 1, 2), 6]
ses4[i, ] <- parameterestimates(l1)[c(13, 10, 1, 2), 7]
if(i %% 50 == 0) cat(i,'..') #my basic progress counter
}
Analysis of Results
mean(icc)
[1] 0.4087556
The regression coefficient estimates are all similar to each other:
data.frame(
cfs = row.names(res1),
pop = c(0, .25, .5, .25), #population values
ri = colMeans(cfs1),
sem.ri = colMeans(cfs4),
rs = colMeans(cfs2),
sem.mlr = colMeans(cfs3)
)
cfs pop ri sem.ri rs sem.mlr
1 (Intercept) 0.00 0.002910729 0.002909412 0.002132698 0.002909412
2 w1 0.25 0.250323074 0.250323007 0.250687997 0.250323007
3 x1 0.50 0.502794531 0.502809691 0.503064188 0.502809691
4 w1:x1 0.25 0.250094542 0.250097290 0.249843687 0.250097290
We save the empirical standard errors (the ‘true’ values or the standard deviation of the coefficients) so we can compute relative bias of the standard errors.
tv1 <- apply(cfs1, 2, sd)
tv2 <- apply(cfs2, 2, sd)
tv3 <- apply(cfs3, 2, sd)
tv4 <- apply(cfs4, 2, sd)
# standard error estimates
data.frame(
cfs = row.names(res1),
ri = colMeans(ses1),
sem.ri = colMeans(ses4),
rs = colMeans(ses2),
sem.mlr = colMeans(ses3)
)
cfs ri sem.ri rs sem.mlr
1 (Intercept) 0.15036244 0.14728788 0.14952362 0.14710271
2 w1 0.15280281 0.14967951 0.15193503 0.14488835
3 x1 0.05189142 0.05178148 0.08144460 0.08199301
4 w1:x1 0.05286951 0.05275631 0.08274414 0.08065533
From the output, we can see that the RI model and the SEM model (also using an RI specification) are similar. The RS model and the SEM model with robust standard errors are more similar to each other.
We can then compute the relative bias in standard errors.
bias <- data.frame(
cfs = row.names(res1),
ri = (colMeans(ses1) - tv1) / tv1,
sem.ri = (colMeans(ses4) - tv4) / tv4,
rs = (colMeans(ses2) - tv2) / tv2,
sem.mlr = (colMeans(ses3) - tv3) / tv3
)
bias
cfs ri sem.ri rs sem.mlr
1 (Intercept) -0.02566800 -0.0455899754 -0.030157970 -0.04678992
2 w1 0.02142401 0.0005514581 0.020163448 -0.03147565
3 x1 -0.40783262 -0.4090621826 -0.032105729 -0.06428387
4 w1:x1 -0.38857941 -0.3899013250 0.003460016 -0.06726389
Visually, we can see that the standard errors for both RI models (using
lmer and sem) are underestimated by over 30% for the cross-level
interaction and the level-1 predictor. The RS model and the RI model
with robust standard errors (estimated using lavaan) are close to each
other.
library(tidyr)
library(ggplot2)
tall <- pivot_longer(bias, ri:sem.mlr)
ggplot(tall, aes(x = cfs, y = value, col = name)) +
geom_jitter(width = .15) +
labs(x = "Coefficient", y = 'Standard error bias', col = '') +
theme_bw()

So, if fixed effects are primarily of interest, RI models are acceptable so long as CRSEs are used (when appropriate). The CRSEs account for unknown forms of heterogeneity. NOTE that there can be instances (based on your data) when cluster robust inferences can fail. In any case, this is helpful if researchers want to use latent variable modeling software (e.g., lavaan) in R to fit their multilevel models.
References:
Huang, F. L., & Zhang, B. (2025). Accounting for random slopes using cluster-robust standard errors in multilevel models. The Journal of Experimental Education, 1–20. https://doi.org/10.1080/00220973.2025.2565180
Related
- When Cluster-robust Inferences Fail
- Accounting for random slopes using cluster-robust standard errors in multilevel models
- Cluster-robust standard errors with three-level data
- Using cluster-robust standard errors when analyzing group-randomized trials with few clusters
- Using robust standard errors for the analysis of binary outcomes with a small number of clusters