Earlier this year, I wrote an article on using instrumental variables (IV) to analyze data from randomized experiments with imperfect compliance (read the manuscript for full details; link updated; it’s open access). In the article, I described the steps of IV estimation and the logic behind it.
The sample code using two stage least squares regression (the correct analysis) is shown below (see article for specifics):
library(ivpack)
dat <- read.csv('http://faculty.missouri.edu/huangf/data/pubdata/pare/ivexample.csv')
head(dat)
## assign takeup y
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
tail(dat)
## assign takeup y
## 195 1 1 9
## 196 1 1 10
## 197 1 1 10
## 198 1 1 12
## 199 1 1 11
## 200 1 1 9
summary(dat)
## assign takeup y
## Min. :0.0 Min. :0.000 Min. : 0.000
## 1st Qu.:0.0 1st Qu.:0.000 1st Qu.: 0.000
## Median :0.5 Median :0.000 Median : 0.000
## Mean :0.5 Mean :0.435 Mean : 4.375
## 3rd Qu.:1.0 3rd Qu.:1.000 3rd Qu.:10.000
## Max. :1.0 Max. :1.000 Max. :13.000
iv1 <- ivreg(y ~ takeup, ~assign, data = dat)
summary(iv1)
##
## Call:
## ivreg(formula = y ~ takeup | assign, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.065942 -0.065942 0.006522 0.006522 2.934058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.006522 0.085095 -0.077 0.939
## takeup 10.072464 0.153269 65.718 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7478 on 198 degrees of freedom
## Multiple R-Squared: 0.9782, Adjusted R-squared: 0.9781
## Wald test: 4319 on 1 and 198 DF, p-value: < 2.2e-16
The treatment on treated (TOT) effect is 10.072 (SE = 0.153).
However, I indicated that:
Although conceptually, the model is a full mediation model, the effect is not estimated using path analysis or structural equation modeling (SEM) as is commonly done in education or psychology (i.e., the indirect path is not path a x path b).
Using SEM, the results do not match.
library(lavaan)
#incorrect
t1 <- '
y ~ takeup
takeup ~ assign'
res1 <- lavaan::sem(model = t1, data = dat)
summary(res1)
## lavaan 0.6-6 ended normally after 42 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 4
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 0.019
## Degrees of freedom 1
## P-value (Chi-square) 0.891
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## y ~
## takeup 10.057 0.106 94.774 0.000
## takeup ~
## assign 0.690 0.050 13.704 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .y 0.554 0.055 10.000 0.000
## .takeup 0.127 0.013 10.000 0.000
Although the path model looks correct, the estimates are off. The TOT (IV) effect is NOT 0.69 \(\times\) 10.057 which is 6.939 (vs. 10.072). The difference as well is that \(y\) is not regressed on the actual/observed takeup values but the predicted takeup values (read the article).
Later on, I came across this post by Paul Allison who had a solution to get the correct estimate. The solution just involved correlating the error terms of the outcome and the actual takeup values.
### correct
t2 <- '
y ~ takeup
takeup ~ assign
y ~~ takeup'
res2 <- lavaan::sem(model = t2, data = dat)
summary(res2)
## lavaan 0.6-6 ended normally after 97 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## y ~
## takeup 10.072 0.153 66.049 0.000
## takeup ~
## assign 0.690 0.050 13.704 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .y ~~
## .takeup -0.004 0.027 -0.137 0.891
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .y 0.554 0.055 9.998 0.000
## .takeup 0.127 0.013 10.000 0.000
The point estimate and the standard errors of the takeup variable now match: 10.072 (SE = 0.153). However, note that this is still not the same as testing the indirect effect of path (a) \(\times\) path(b).
- END