Posts

🎉 Comparing coefficients across logistic regression models

Jun 28, 2020

ROUGH NOTES: [let me know if you spot any errors– there might be a couple!] Often, in randomized control trial where individuals are randomly assigned to treatment and control conditions, covariates are included to improve precision by reducing error and improving statistical power. However, when binary outcomes are used (e.g., patient recovers or not), there are several additional concerns that have gone unnoticed by many applied researchers.

Jun 28, 2020

Prior problem behavior and suspensions: A replication

Jun 2, 2020

This post accompanies the article (this was originally submitted as an Issue Brief which is why the Discussion section and Introduction are so short):

Huang, F. (2020). Prior problem behaviors do not account for the racial suspension gap. Educational Researcher. Advance online publication.

Jun 2, 2020

🎉 Using FIML and MI in R

Feb 27, 2020

Missing data

Feb 27, 2020

🎉 Why does centering reduce multicollinearity?

Oct 27, 2019

Centering often reduces the correlation between the individual variables (x1, x2) and the product term (x1 \(\times\) x2). In the example below, r(x1, x1x2) = .80. With the centered variables, r(x1c, x1x2c) = -.15.

Oct 27, 2019

🎉 Principal Components Analysis using R

Oct 27, 2019

[Rough notes: Let me know if there are corrections]

Principal components analysis (PCA) is a convenient way to reduce high-dimensional data into a smaller number number of ‘components.’ PCA has been referred to as a data reduction/compression technique (i.e., dimensionality reduction). PCA is often used as a means to an end and is not the end in itself. For example, instead of performing a regression with six (and highly correlated) variables, we may be able to compress the data into one or two meaningful components instead and use these in our models instead of the original six variables. Using less variables reduces possible problems associated with multicollinearity. This decreases the problems of redundancy.

Oct 27, 2019

🎉 Applied example for alternatives to logistic regression

Oct 27, 2019

Introduction

Logistic regression is often used to analyze experiments with binary outcomes (e.g., pass vs fail) and binary predictors (e.g., treatment vs control). Although appropriate, there are other possible models that can be run that may provide easier to interpret results.

Oct 27, 2019

Poisson and Negative Binomial Regression using R

May 21, 2019

A few years ago, I published an article on using Poisson, negative binomial, and zero inflated models in analyzing count data (see Pick Your Poisson). The abstract of the article indicates:

May 21, 2019

Correct standard errors?

Apr 14, 2019

The other day in class, while talking about instances (e.g., analyzing clustered data or heteroskedastic residuals) where adjustments are required to the standard errors of a regression model, a student asked: how do we know what the ‘true’ standard error should be in the first place– which is necessary to know if it is too high or too low.

Apr 14, 2019

Note on Robust Standard Errors

Nov 4, 2018

Illustration showing different flavors of robust standard errors. Load in library, dataset, and recode. Do not really need to dummy code but may make making the X matrix easier. Using the High School & Beyond (hsb) dataset.

Nov 4, 2018

Multilevel CFA (MLCFA) in R, part 2

Nov 1, 2018

A while back, I wrote a note about how to conduct a multilevel confirmatory factor analysis (MLCFA) in R. Part of the note shows how to setup lavaan to be able to run the MLCFA model. NOTE: one of the important aspects of an MLCFA is that the factor structure at the two levels may not be the same– that is the factor structures are invariant across levels. The setup process is/was cumbersome– but putting the note together was informative. Testing a 2-1 factor model (i.e., 2 factors at the first level and 1 factor at the second level) required the following code (see the original note for the detailed explanation of the setup and what the variables represent). This is a measure of school engagement; n = 3,894 students in 254 schools.

Nov 1, 2018