Instrumental variables within an SEM framework

Last updated on Jan 21, 2023 4 min read instrumental variable, evaluation, SEM

Earlier this year, I wrote an article on using instrumental variables (IV) to analyze data from randomized experiments with imperfect compliance (read the manuscript for full details; link updated; it’s open access). In the article, I described the steps of IV estimation and the logic behind it.

The sample code using two stage least squares regression (the correct analysis) is shown below (see article for specifics):

library(ivpack)
dat <- read.csv('http://faculty.missouri.edu/huangf/data/pubdata/pare/ivexample.csv')
head(dat)

##   assign takeup y
## 1      0      0 0
## 2      0      0 0
## 3      0      0 0
## 4      0      0 0
## 5      0      0 0
## 6      0      0 0

tail(dat)

##     assign takeup  y
## 195      1      1  9
## 196      1      1 10
## 197      1      1 10
## 198      1      1 12
## 199      1      1 11
## 200      1      1  9

summary(dat)

##      assign        takeup            y         
##  Min.   :0.0   Min.   :0.000   Min.   : 0.000  
##  1st Qu.:0.0   1st Qu.:0.000   1st Qu.: 0.000  
##  Median :0.5   Median :0.000   Median : 0.000  
##  Mean   :0.5   Mean   :0.435   Mean   : 4.375  
##  3rd Qu.:1.0   3rd Qu.:1.000   3rd Qu.:10.000  
##  Max.   :1.0   Max.   :1.000   Max.   :13.000

iv1 <- ivreg(y ~ takeup, ~assign, data = dat)
summary(iv1)

## 
## Call:
## ivreg(formula = y ~ takeup | assign, data = dat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -3.065942 -0.065942  0.006522  0.006522  2.934058 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.006522   0.085095  -0.077    0.939    
## takeup      10.072464   0.153269  65.718   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7478 on 198 degrees of freedom
## Multiple R-Squared: 0.9782,  Adjusted R-squared: 0.9781 
## Wald test:  4319 on 1 and 198 DF,  p-value: < 2.2e-16

The treatment on treated (TOT) effect is 10.072 (SE = 0.153).

However, I indicated that:

Although conceptually, the model is a full mediation model, the effect is not estimated using path analysis or structural equation modeling (SEM) as is commonly done in education or psychology (i.e., the indirect path is not path a x path b).

Using SEM, the results do not match.

library(lavaan)

#incorrect
t1 <- '
y ~ takeup
takeup ~ assign'

res1 <- lavaan::sem(model = t1, data = dat)
summary(res1)

## lavaan 0.6-6 ended normally after 42 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          4
##                                                       
##   Number of observations                           200
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.019
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.891
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   y ~                                                 
##     takeup           10.057    0.106   94.774    0.000
##   takeup ~                                            
##     assign            0.690    0.050   13.704    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y                 0.554    0.055   10.000    0.000
##    .takeup            0.127    0.013   10.000    0.000

Although the path model looks correct, the estimates are off. The TOT (IV) effect is NOT 0.69 \(\times\) 10.057 which is 6.939 (vs. 10.072). The difference as well is that \(y\) is not regressed on the actual/observed takeup values but the predicted takeup values (read the article).

Later on, I came across this post by Paul Allison who had a solution to get the correct estimate. The solution just involved correlating the error terms of the outcome and the actual takeup values.

### correct
t2 <- '
y ~ takeup
takeup ~ assign
y ~~ takeup'

res2 <- lavaan::sem(model = t2, data = dat)
summary(res2)

## lavaan 0.6-6 ended normally after 97 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##   Number of observations                           200
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   y ~                                                 
##     takeup           10.072    0.153   66.049    0.000
##   takeup ~                                            
##     assign            0.690    0.050   13.704    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .y ~~                                                
##    .takeup           -0.004    0.027   -0.137    0.891
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y                 0.554    0.055    9.998    0.000
##    .takeup            0.127    0.013   10.000    0.000

The point estimate and the standard errors of the takeup variable now match: 10.072 (SE = 0.153). However, note that this is still not the same as testing the indirect effect of path (a) \(\times\) path(b).

iv R

Instrumental variables within an SEM framework

Related