Regression Discontinuity & Weighting Designs

Controlling for Covariates in Observational Studies

rdrobust
modeling
linear regression
ggplot2
causal inference
Published

December 15, 2024

Introduction

This project was a presentation I had on Causal Inference. Specifically, I did a summary of the RDD and Weighting Design methods.

Overview of Methods

  • Regression Discontinuity Design (RDD):
    • Sharp nonparametric regression discontinuity.
  • Before-After (BA) Design:
    • Interrupted time-series for estimating treatment effects.
  • Treatment Effect Estimation with Weighting:
    • Addresses covariate imbalance using propensity scores.

Regression Discontinuity Design (RDD)

Sharp Nonparametric RDD

Setup:

  • Treatment determined by a threshold \(\tau\) on the running variable variable \(x\).
  • Example: Eligibility for aid based on income \(d = 1[x \geq \tau]\).

Model:

\[ y = \beta_d d + g(x) + u \]

  • \(\beta_d\): Treatment effect.
  • \(g(x)\): Smooth function of \(x\).

Key Assumptions:

  1. Continuity in Observables: \(g(x)\) is continuous at \(x = \tau\).
  2. Continuity in Unobservables: \(E(u|x)\) is the same just above and below \(\tau\).

Estimation:

\[ \beta_d = \lim_{x \to \tau^+} E(y|x) - \lim_{x \to \tau^-} E(y|x) \]

Illustration

Simulating Data and Visualization of Concept

Code
# Load necessary libraries
library(ggplot2)

# Step 1: Simulate Sharp RDD Data
set.seed(123)
n <- 500
threshold <- 50

# Generate data
x <- runif(n, 30, 70)  # Running variable
treatment <- as.numeric(x >= threshold)  # Treatment assignment
y <- ifelse(
  treatment == 1,
  30 + 0.5 * x + 5 + rnorm(n, mean = 0, sd = 2),  # Treated outcome
  30 + 0.5 * x + rnorm(n, mean = 0, sd = 2)       # Control outcome
)

# Create a data frame
data <- data.frame(X = x, Treatment = treatment, Outcome = y)

# Step 2: Create the Scatter Plot with Regression Lines
ggplot(data, aes(x = X, y = Outcome, color = factor(Treatment))) +
  geom_point(alpha = 0.6) +  # Scatter points
  geom_smooth(data = subset(data, Treatment == 0), 
              method = "loess", se = FALSE, color = "blue", size = 1.2) +
  geom_smooth(data = subset(data, Treatment == 1), 
              method = "loess", se = FALSE, color = "green", size = 1.2) +
  geom_vline(xintercept = threshold, linetype = "dashed", color = "red", size = 1.2) +
  labs(
    title = "Sharp Nonparametric RDD Visualization",
    x = "Running Variable (X)",
    y = "Outcome (Y)",
    color = "Treatment Status"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    legend.position = "top"
  )

Model Fitting - In a nutshell!

Code
# Load necessary libraries
library(rdrobust)

fit = rdrobust::rdrobust(y = data$Outcome, x = data$X, c = threshold, all = T)

summary(fit)
Sharp RD estimates using local polynomial regression.

Number of Obs.                  500
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                  265          235
Eff. Number of Obs.              71           52
Order est. (p)                    1            1
Order bias  (q)                   2            2
BW est. (h)                   5.085        5.085
BW bias (b)                   8.424        8.424
rho (h/b)                     0.604        0.604
Unique Obs.                     265          235

=======================================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
=======================================================================================
  Conventional     4.813     0.682     7.057     0.000     [3.476 , 6.150]     
Bias-Corrected     5.044     0.682     7.396     0.000     [3.707 , 6.380]     
        Robust     5.044     0.799     6.316     0.000     [3.479 , 6.609]     
=======================================================================================

Before-After (BA) Design

Overview

  • Concept: Compare outcomes before and after a treatment.
  • Example: Assessing the effect of a speed limit law on accident rates.

Model:

\[ y_t = \beta_d d_t + g(t) + u_t \]

  • \(\beta_d\): Treatment effect.
  • \(g(t)\): Time trend.

Key Assumptions:

  • Treatment effect occurs immediately after intervention.
  • Other time-varying factors are minimal near the intervention.

Challenges of BA Design

  • Simultaneous Changes: Other factors (e.g., economic trends) may confound results.
  • Gradual Effects: Treatment effects that develop slowly are hard to detect.
  • No Contemporary Control Group: Relies solely on pre-treatment data as the control.

Illustration

Simulating Data and Visualization of Concept

Code
# Load necessary libraries
library(ggplot2)
library(dplyr)

# Step 1: Simulate Before-After Dataset
set.seed(123)
n <- 200  # Number of observations
break_year <- 1995  # Year of intervention

# Generate data
year <- seq(1990, 2000, length.out = n)
pre_treatment <- year < break_year
treatment <- as.numeric(year >= break_year)

# Simulate outcomes
outcome <- ifelse(
  pre_treatment,
  50 - 2 * (year - 1990) + rnorm(n, mean = 0, sd = 2),  # Pre-treatment trend
  30 - 1.5 * (year - break_year) + rnorm(n, mean = 0, sd = 2)  # Post-treatment trend
)

# Create a data frame
data <- data.frame(Year = year, Treatment = treatment, Outcome = outcome)

# Step 2: Fit Separate Regression Models
pre_model <- lm(Outcome ~ Year, data = data[data$Treatment == 0, ])
post_model <- lm(Outcome ~ Year, data = data[data$Treatment == 1, ])

# Step 3: Visualize the Before-After Trend
ggplot(data, aes(x = Year, y = Outcome)) +
  geom_point(aes(color = factor(Treatment)), size = 2, alpha = 0.7) +
  geom_smooth(method = "lm", data = data[data$Treatment == 0, ], 
              aes(color = "Pre-Treatment"), se = FALSE) +
  geom_smooth(method = "lm", data = data[data$Treatment == 1, ], 
              aes(color = "Post-Treatment"), se = FALSE) +
  geom_vline(xintercept = break_year, linetype = "dashed", color = "red", size = 1) +
  labs(title = "Before-After Design: Interrupted Time Series",
       x = "Year", y = "Outcome",
       color = "Treatment Status") +
  theme_minimal()

Treatment Effect Estimation with Weighting

Motivation for Weighting

  • Observational studies often suffer from selection bias.
  • Treated and untreated groups may differ in observed covariates (“x”).
  • Weighting adjusts for differences in covariates to estimate:
    • Effect on the untreated
    • Effect on the treated
    • Effect on the population

Basic Idea

Weighting Definition

  • Derive weights using the propensity score: \(\pi(x) = P(d = 1|x)\)
  • Weights:
    • Treated: \(\frac{1}{\pi(x)}\)
    • Untreated: \(\frac{1}{1 - \pi(x)}\)

Objective:

  • Balance the covariate distributions of treated and untreated groups.

Estimating Treatment Effects

Effect on the Untreated

Formula:

\[ E(y_1|d = 0) = P(d = 0)^{-1} \cdot \left[ E\left( \frac{d \cdot y}{\pi(x)} \right) - E(d \cdot y) \right] \]

Treatment Effect:

\[ E(y_1|d = 0) - E(y|d = 0) \]

Effect on the Treated

Formula:

\[ E(y_0|d = 1) = P(d = 1)^{-1} \cdot \left[ E\left( \frac{(1 - d) \cdot y}{1 - \pi(x)} \right) - E((1 - d) \cdot y) \right] \]

Treatment Effect:

\[ E(y|d = 1) - E(y_0|d = 1) \]

Effect on the Population

Formula:

\[ E(y_1 - y_0) = E\left( \frac{d \cdot y}{\pi(x)} \right) - E\left( \frac{(1 - d) \cdot y}{1 - \pi(x)} \right) \]

Benefits of Weighting

  • Balances covariates between treated and untreated groups.
  • Avoids dimension problems associated with regression.
  • Improves efficiency of estimators.

Challenges of Weighting

  • Small Propensity Scores:
    • When \(\pi(x)\) is near 0 or 1, weights can become unstable.
  • Model Dependence:
    • Relies on correctly specifying \(\pi(x)\).
  • Unobserved Confounders:
    • Cannot address bias from unobserved factors.

Practical Implementation

  1. Estimate propensity scores (e.g., logistic regression).
  2. Compute weights:
    • Treated: \(\frac{1}{\pi(x)}\)
    • Untreated: \(\frac{1}{1 - \pi(x)}\)
  3. Estimate treatment effects using weighted averages or regression.

References

Lee, Myoung-jae. 2005. Micro-Econometrics for Policy, Program and Treatment Effects. Oxford University Press. https://doi.org/10.1093/0199267693.001.0001.
Back to top