How to simulate multilevel data using a Monte Carlo simulation.
To start off, the sample variance formula is:
$$s^2 = \frac{\sum_{i=1}^{n}(x_i - \overline{x})^2} {n - 1 }$$First of all, $x - \overline{x}$ is a deviation score (deviation from what? deviation from the mean). Summing the deviations will just get us zero so the deviations are squared and then added together. The numerator of this formula is then called the sum of squared deviations which is literally what it is. This is not yet what we refer to as the variance (s2). We have to divide this by n − 1 which is the sample degrees of freedom.
Earlier this year, I wrote an article on using instrumental variables (IV) to analyze data from randomized experiments with imperfect compliance (read the manuscript for full details; link updated; it’s open access). In the article, I described the steps of IV estimation and the logic behind it.
In our module on regression diagnostics, I mentioned 1) that at times (with clustered data) standard errors may be misestimated and may be too low, resulting in a greater chance of making a Type I error (i.e., claiming statistically significant results when they should not be). In our ANCOVA session, I also indicated that 2) covariates are helpful because they help to lower the (standard) error in the model and increase power. So, it sounds like we would like to have models with lower standard errors. However, there are cases when the standard error is estimated lower than it should be (i.e., the standard error is biased).
A primer on using IVs
This issue plagues a lot of the analysis using secondary or observational data
To illustrate how OVB may affect regression results, we examine some simulated data.