Updated 1/15
Probably the best overall discussion of RD and its assumptions from a conceptual point of view is Dunning (2012), Natural Experiments in the Social Sciences: A Design-Based Approach. I highly recommend this book; he also provides thoughtful coverage of instrumental variables.
I. General background
Jacob et al. (2012). A practical guide to regression discontinuity. This guide is the best starting place for anyone interested in RD. For the applied researcher, they provide several useful checklists for anyone doing an RD study.
Regression discontinuity (RD) analysis is a rigorous nonexperimental approach that can be used to estimate program impacts in situations in which candidates are selected for treatment based on whether their value for a numeric rating exceeds a designated threshold or cut-point. Over the last two decades, the regression discontinuity approach has been used to evaluate the impact of a wide variety of social programs (DiNardo and Lee, 2004; Hahn, Todd, and van der Klaauw, 1999; Lemieux and Milligan, 2004; van der Klaauw, 2002; Angrist and Lavy, 1999; Jacob and Lefgren, 2006; McEwan and Shapiro, 2008; Black, Galdo, and Smith, 2007; Gamse, Bloom, Kemple, and Jacob, 2008). Yet, despite the growing popularity of the approach, there is only a limited amount of accessible information to guide researchers in the implementation of an RD design. While the approach is intuitively appealing, the statistical details regarding the implementation of an RD design are more complicated than they might first appear. Most of the guidance that currently exists appears in technical journals that require a high degree of technical sophistication to read. Furthermore, the terminology that is used is not well defined and is often used inconsistently. Finally, while a number of different approaches to the implementation of an RD design are proposed in the literature, they each differ slightly in their details. As such, even researchers with a fairly sophisticated statistical background can find it difficult to access practical guidance for the implementation of an RD design.
To help fill this void, the present paper is intended to serve as a practitioners’ guide to implementing RDdesigns. It seeks to explain things in easy-to-understand language and to offer best practices and general guidance to those attempting an RD analysis. In addition, the guide illustrates the various techniques available to researchers and explores their strengths and weaknesses using a simulated dataset, which can be accessed here.
The guide provides a general overview of the RD approach and then covers the following topics in detail: (1) graphical presentation in RD analysis, (2) estimation (both parametric and nonparametric), (3) establishing the interval validity of RD impacts, (4) the precision of RD estimates, (5) the generalizability ofRD findings, and (6) estimation and precision in the context of a fuzzy RD analysis. Readers will find both a glossary of widely used terms and a checklist of steps to follow when implementing an RD design in the Appendixes.
Shadish et al. (2001). Experimental and Quasi-Experimental Designs for Generalized Causal Inference, chapter 7. Their chapter on RD is an excellent introduction to this topic; also a good all-around reference book on causal inference.
Bloom (2009). Modern regression discontinuity analysis. Another good introduction to RD. Besides explaining the basics of RD, he discusses the various treatment effects that can be estimated with an RD design; these can be overlooked when focusing on the technical aspects of estimation.
This paper provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of regression discontinuity analysis and summarizes its past applications. Part 2 explains how in theory a regression discontinuity analysis can identify an average effect of treatment for a population and how different types of regression discontinuity analyses — “sharp” versus “fuzzy” — can identify average treatment effects for different conceptual subpopulations. Part 3 of the paper introduces graphical methods, parametric statistical methods, and nonparametric statistical methods for estimating treatment effects in practice from regression discontinuity data plus validation tests and robustness tests for assessing these estimates. Section 4 considers generalizing regression discontinuity findings and presents several different views on and approaches to the issue. Part 5 notes some important issues to pursue in future research about or applications of regression discontinuity analysis.
Schochet et al. (2010). Standards for regression discontinuity designs. From the What Works Clearinghouse, a short discussion of what criteria a strong RD study should meet.
II. More advanced treatments of RD
Imbens & Lemieux (2008). Regression discontinuity designs: A guide to practice. J of Econometrics. A more advanced treatment, with a checklist on how to proceed with any RD analysis.
In regression discontinuity (RD) designs for evaluating causal effects of interventions, assignment to a treatment is determined at least partly by the value of an observed covariate lying on either side of a fixed threshold. These designs were first introduced in the evaluation literature by Thistlewaite and Campbell (1960). With the exception of a few unpublished theoretical papers, these methods did not attract much attention in the economics literature until recently. Starting in the late 1990s, there has been a large number of studies in economics applying and extending RD methods. In this paper we review some of the practical and theoretical issues in implementation of RD methods.
Lee & Lemieux (2010). Regression discontinuity designs in economics. J of Economic Literature. Another recommended review of RD in practice.
This paper provides an introduction and “user guide” to Regression Discontinuity (RD) designs for empirical researchers. It presents the basic theory behind the research design, details when RD is likely to be valid or invalid given economic incentives, explains why it is considered a “quasi-experimental” design, and summarizes different ways (with their advantages and disadvantages) of estimating RD designs and the limitations of interpreting these estimates. Concepts are discussed using examples drawn from the growing body of empirical research using RD.
III. Advanced topics – estimation and diagnostics
McCrary (2008). Manipulation of the running variable in the regression discontinuity design: A density test. J of Econometrics. A major issue with RD designs, in terms of internal validity, is whether units are able to manipulate their score on the assignment variable (e.g., retaking tests until the cutoff score is attained). This is another technical paper, but he also has provided his Stata code for others to use. The “McCrary test” is now standard for RD papers.
Standard sufficient conditions for identification in the regression discontinuity design are continuity of the conditional expectation of counterfactual outcomes in the running variable. These continuity assumptions may not be plausible if agents are able to manipulate the running variable. This paper develops a test of manipulation related to continuity of the running variable density function. The methodology is applied to popular elections to the House of Representatives, where sorting is neither expected nor found, and to roll call voting in the House, where sorting is both expected and found.
Barreca et al. (2014). Heaping-induced bias in regression-discontinuity designs. (Originally a NBER Working paper.) They discuss issues around the assignment variable, when observations can “heap” on either side of the cutoff due to issues of data reporting.
This study uses Monte Carlo simulations to demonstrate that regression-discontinuity designs arrive at biased estimates when attributes related to outcomes predict heaping in the running variable. After showing that our usual diagnostics are poorly suited to identifying this type of problem, we provide alternatives, and then discuss the usefulness of different approaches to addressing the bias. We then consider these issues in multiple non-simulated environments.
Imbens & Kalyanaraman (2009). Optimal bandwidth choice for the regression discontinuity estimator. NBER Working paper. Researchers often struggle to determine the best bandwidth around the discontinuity; I&K propose a method to do this. The paper is highly technical, but they have generously provided software to implement their bandwidth selection process (in both Matlab and Stata).
We investigate the problem of optimal choice of the smoothing parameter (bandwidth) for the regression discontinuity estimator. We focus on estimation by local linear regression, which was shown to be rate optimal (Porter, 2003). Investigation of an expected-squared-error-loss criterion reveals the need for regularization. We propose an optimal, data dependent, bandwidth choice rule. We illustrate the proposed bandwidth choice using data previously analyzed by Lee (2008), as well as in a simulation study based on this data set. The simulations suggest that the proposed rule performs well.
IV. Advanced topics – multiple assignment variables
This is a hot topic for RD designs, as many interventions use more than one assignment variable. Listed below are the only papers I have found on this topic; please email me if you know of any others.
Imbens & Zajonc (2011). Regression discontinuity design with multiple forcing variables. (Original paper is not available, but the dissertation is.)
Regression discontinuity designs identify causal effects by exploiting treatment assignment rules that are discontinuous functions of underlying covariates. In the standard regression discontinuity design setup, the probability of treatment changes discontinuously if a scalar covariate exceeds a cutoff. We consider more complex treatment assignment rules that generate a treatment boundary. Leading examples include education policies where treatment depends on multiple test scores and spatial treatment discontinuities arising from geographic borders. We give local linear estimators for both the conditional effect along the boundary and the average effect over the boundary, and a consistent estimate for the variance of the average effect based on the nonparametric delta method. For twodimensional RD designs, we derive an optimal, data-dependent, bandwidth selection rule for the conditional effect. We demonstrate these methods using a summer school and grade retention example.
Reardon & Robinson (2010). Regression discontinuity designs with multiple rating-score variables.
In the absence of a randomized control trial, regression discontinuity (RD) designs can produce plausible estimates of the treatment effect on an outcome for individuals near a cutoff score. In the standard RD design, individuals with rating scores higher than some exogenously determined cutoff score are assigned to one treatment condition; those with rating scores below the cutoff score are assigned to an alternate treatment condition. Many education policies, however, assign treatment status on the basis of more than one rating-score dimension. We refer to this class of RD designs as “multiple rating score regression discontinuity” (MRSRD) designs. In this paper, we discuss five different approaches to estimating treatment effects using MRSRD designs (response surface RD; frontier RD; fuzzy frontier RD; distance-based RD; and binding-score RD). We discuss differences among them in terms of their estimands, applications, statistical power, and potential extensions for studying heterogeneity of treatment effects.
Papay et al. (in press). Extending the regression-discontinuity approach to multiple assignment variables. J of Econometrics.
The recent scholarly attention to the regression-discontinuity design has focused exclusively on the application of a single assignment variable. In many settings, however, exogenously imposed cutoffs on several assignment variables define a set of different treatments. In this paper, we show how to generalize the standard regression-discontinuity approach to include multiple assignment variables simultaneously. We demonstrate that fitting this general, flexible regression-discontinuity model enables us to estimate several treatment effects of interest.
Wong et al. (2012). Analyzing regression-discontinuity designs with multiple assignment variables: A comparative study of four estimation methods.
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. A more flexible conceptualization of RDD, however, allows researchers to examine effects along a multidimensional frontier using multiple assignment variables and cutoffs. This paper introduces the multivariate regression-discontinuity design (MRDD). For a MRDD with two assignment variables, we show that the overall treatment effect at the cutoff frontier can be decomposed into a weighted average of two univariate RDD effects, and that the weights depend on the scaling of the assignment variables. The paper discusses four methods for estimating MRDD treatment effects—the frontier, centering, univariate, and instrumental variable approaches—and compares their relative performance in a Monte Carlo simulation study under different scenarios. We find that given correct model specifications, all four approaches estimate treatment effects without bias, but the instrumental variable approach has severe limitations in terms of more stringent required assumptions and reduced efficiency.
V. Power
Lee & Munk (2008). Regression discontinuity designs: A guide to practice. An introduction to power analysis for RD with a simple random sample.
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. A more flexible conceptualization of RDD, however, allows researchers to examine effects along a multidimensional frontier using multiple assignment variables and cutoffs. This paper introduces the multivariate regression-discontinuity design (MRDD). For a MRDD with two assignment variables, we show that the overall treatment effect at the cutoff frontier can be decomposed into a weighted average of two univariate RDD effects, and that the weights depend on the scaling of the assignment variables. The paper discusses four methods for estimating MRDD treatment effects—the frontier, centering, univariate, and instrumental variable approaches—and compares their relative performance in a Monte Carlo simulation study under different scenarios. We find that given correct model specifications, all four approaches estimate treatment effects without bias, but the instrumental variable approach has severe limitations in terms of more stringent required assumptions and reduced efficiency.
Schochet (2008). Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations. The definitive paper for estimating power for RD, especially in a multilevel context. A big chunk of this paper is his 2009 JASA article.
Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations examines theoretical and empirical issues related to the statistical power of impact estimates under clustered regression discontinuity (RD) designs. The theory is grounded in the causal inference and HLM modeling literature, and the empirical work focuses on commonly-used designs in education research to test intervention effects on student test scores. The main conclusion is that three to four times larger samples are typically required under RD than experimental clustered designs to produce impacts with the same level of statistical precision. Thus, the viability of using RD designs for new impact evaluations of educational interventions may be limited, and will depend on the point of treatment assignment, the availability of pretests, and key research questions.