In class this week, I mentioned that among researchers there is an implicit prestige hierarchy of analytical techniques, in this order:
- Random assignment of treatment
- Regression discontinuity
- Difference-in-difference (with a change in treatment that is clearly exogenous)
- Fixed effects panel models and instrumental variables
- OLS, HLM, logistic regression, etc.(any version of the general linear model)
Why this order? As I keep emphasizing to my students, any conclusions we draw from a statistical analysis depends on whether all of the assumptions for a particular application have been met. Moving from the top to the bottom of the list, we move from techniques that have fewer assumptions, assumptions that are more likely than not to hold in applied work, and assumptions that can be verified empirically, to techniques that require strong, unverifiable assumptions.
RD is #2 because it most closely approximates an experimental design. DiD is ranked above fixed effects and IV simply because with a clearly exogenous change in treatment, the underlying assumptions are plausible. Once we hit #4, we start moving back to the OLS world of model dependence, where our results begin to depend heavily on model specification. Matching is #5 because it does not presume to deal with unobservables, and because we are never sure if $ Y(1), Y(0) \perp T \mid X $ holds.