library(ggplot2, quietly = TRUE)
# Visualize propensity score distributions
ggplot(data, aes(x = pscore, fill = factor(treat))) +
geom_histogram(alpha = 0.5, position = "identity", bins = 30) +
labs(x = "Propensity Score", y = "Count", fill = "Treatment") +
theme_minimal()Sensitivity Analysis
Matching estimates are only valid if the ignorability assumption holds—that all confounders are observed. Since this assumption is untestable, sensitivity analyses assess how robust the results are to potential violations.
Assessing Common Support
Definition
Common support (or overlap) requires that for every covariate value, there is a positive probability of receiving both treatment and control:
\[ 0 < P(T = 1 | X = x) < 1 \quad \forall x \]
Without overlap, comparisons require extrapolation beyond the empirical data, making causal inference unreliable.
Diagnosing Lack of Overlap
1. Propensity Score Histograms
Look for:
- Non-overlapping regions at extreme values
- Treatment group with no control counterparts (or vice versa)
2. Convex Hull Method
The convex hull identifies the multidimensional space where interpolation (not extrapolation) is possible (King and Zeng 2006).
- Discard units outside the convex hull of the opposite group
- More restrictive than propensity score trimming
3. Discarding Units Outside Common Support
By propensity score:
# Define range
ps_min <- max(min(data$pscore[data$treat == 1]),
min(data$pscore[data$treat == 0]))
ps_max <- min(max(data$pscore[data$treat == 1]),
max(data$pscore[data$treat == 0]))
# Keep only units in common support
data_trimmed <- data %>%
filter(pscore >= ps_min & pscore <= ps_max)Implications for Estimands
- ATT: It may be acceptable to discard control units outside the treated range
- ATE: Discarding any units changes the target population and reduces external validity
Balance Diagnostics
After matching, verify that covariates are balanced between treated and control groups.
Standardized Mean Differences
The standardized mean difference (SMD) for covariate \(X\) is:
\[ \text{SMD} = \frac{\bar{X}_T - \bar{X}_C}{\sqrt{\frac{s_T^2 + s_C^2}{2}}} \]
where \(\bar{X}_T, \bar{X}_C\) are means and \(s_T^2, s_C^2\) are variances in treated and control groups.
Benchmark: SMD \(< 0.1\) indicates good balance (Rubin 2001)
Variance Ratios
Compare variances of continuous covariates:
\[ \text{VR} = \frac{s_T^2}{s_C^2} \]
Benchmark: \(0.5 < \text{VR} < 2.0\) (Rubin 2001)
Propensity Score Balance
Standardized difference of propensity score means: Should be \(< 0.25\)
Variance ratio of propensity scores: Should be between 0.5 and 2.0
Graphical Diagnostics
Love Plots
Display standardized mean differences before and after matching:
library(cobalt, quietly = TRUE)
bal.plot(m.out, var.name = "distance", which = "both")
love.plot(m.out, threshold = 0.1,
abs = TRUE,
var.order = "unadjusted")QQ Plots
Compare empirical distributions of covariates:
plot(m.out, type = "qq", which.xs = c("age", "educ", "re74"))Empirical CDFs
plot(m.out, type = "ecdf")Avoid Using P-values
Do not use t-tests or other hypothesis tests to assess balance:
- P-values conflate effect size with sample size
- Large samples will show “significant” differences even for trivial imbalances
- Balance is a design property, not a hypothesis test
Rosenbaum Sensitivity Analysis
Motivation
Even after matching, hidden bias from unobserved confounders \(U\) may remain. Rosenbaum bounds quantify how strong this bias must be to overturn the findings.
Setup
Let \(\pi_i = P(T_i = 1)\) be the true probability of treatment for unit \(i\). Define the odds of treatment:
\[ \text{Odds}_i = \frac{\pi_i}{1 - \pi_i} \]
For two matched units \(i\) and \(j\), the odds ratio is bounded by \(\Gamma\):
\[ \frac{1}{\Gamma} \leq \frac{\text{Odds}_i}{\text{Odds}_j} \leq \Gamma \]
Interpretation
- \(\Gamma = 1\): No hidden bias (treatment is “as good as randomly assigned” conditional on \(X\))
- \(\Gamma = 2\): One unit could be twice as likely to receive treatment due to an unobserved confounder
- \(\Gamma = 3\): Three times as likely
Procedure
- Choose a range of \(\Gamma\) values (e.g., 1.0, 1.1, 1.2, …, 3.0)
- For each \(\Gamma\), compute bounds on the p-value or treatment effect estimate
- Report the critical \(\Gamma\) at which the effect becomes insignificant
Estimation
library(rbounds, quietly = TRUE)
library(MatchIt, quietly = TRUE)
# Download the lalonde data
df <- MatchIt::lalonde
# Perform matching
m.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = df,
method = "nearest")
matched_data <- match.data(m.out)
# Split into groups
treat_y <- matched_data$re78[matched_data$treat == 1]
control_y <- matched_data$re78[matched_data$treat == 0]
# P-value sensitivity
psens(treat_y, control_y, Gamma = 3, GammaInc = 0.1)
Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
Unconfounded estimate .... 0.1928
Gamma Lower bound Upper bound
1.0 0.1928 0.1928
1.1 0.0796 0.3709
1.2 0.0285 0.5640
1.3 0.0091 0.7300
1.4 0.0026 0.8489
1.5 0.0007 0.9226
1.6 0.0002 0.9633
1.7 0.0000 0.9837
1.8 0.0000 0.9932
1.9 0.0000 0.9973
2.0 0.0000 0.9990
2.1 0.0000 0.9996
2.2 0.0000 0.9999
2.3 0.0000 1.0000
2.4 0.0000 1.0000
2.5 0.0000 1.0000
2.6 0.0000 1.0000
2.7 0.0000 1.0000
2.8 0.0000 1.0000
2.9 0.0000 1.0000
3.0 0.0000 1.0000
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
# Hodges-Lehmann point estimate sensitivity
hlsens(treat_y, control_y, Gamma = 3, GammaInc = 0.1)
Rosenbaum Sensitivity Test for Hodges-Lehmann Point Estimate
Unconfounded estimate .... 686.1615
Gamma Lower bound Upper bound
1.0 686.160 686.16
1.1 165.060 932.06
1.2 -50.038 1308.00
1.3 -407.340 1581.90
1.4 -680.640 1853.70
1.5 -938.240 2039.30
1.6 -1219.100 2268.00
1.7 -1452.600 2514.10
1.8 -1716.700 2756.70
1.9 -1960.600 3070.00
2.0 -2185.800 3282.40
2.1 -2422.000 3508.80
2.2 -2630.700 3721.00
2.3 -2820.400 3892.10
2.4 -2943.000 4044.80
2.5 -3081.200 4208.40
2.6 -3223.600 4361.10
2.7 -3355.900 4531.70
2.8 -3472.100 4685.60
2.9 -3607.900 4816.70
3.0 -3720.700 4950.00
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
Output
- Tables showing p-values and confidence intervals at different \(\Gamma\) levels
- Identify \(\Gamma\) where effect loses significance
Interpretation
If the critical \(\Gamma\) is:
- < 1.5: Results are very sensitive to hidden bias
- 1.5 - 2.5: Moderately robust
- > 2.5: Highly robust—unobserved confounder would need to be very strong
# No direct Rosenbaum bounds in Python
# Use R via rpy2 or implement manually using Wilcoxon signed-rank test
import rpy2.robjects as ro
ro.r('library(rbounds, quietly = TRUE)')
result = ro.r('psens(treat_y, control_y, Gamma=2)')* After matching, save matched sample
ssc install rbounds
* Sensitivity analysis
rbounds outcome, gamma(1(0.1)3)Relative Correlation Restrictions
An alternative sensitivity approach bounds the treatment effect under assumptions about selection on unobservables relative to selection on observables (Altonji, Elder, and Taber 2005).
Framework
Assume:
\[ \text{Cov}(T, \epsilon) = \lambda \cdot \text{Cov}(T, X^\top \gamma) \]
where:
- \(\epsilon\) contains unobserved confounders
- \(\lambda\) is the relative selection ratio
Interpretation
- \(\lambda = 0\): No selection on unobservables (standard OLS)
- \(\lambda = 1\): Selection on unobservables is equal to selection on observables
- \(\lambda > 1\): Unobservables are more important than observables
Critical value \(\lambda^*\): The value at which the treatment effect reaches zero
Implementation
library(rcrbounds, quietly = TRUE)
# Estimate bounds
rcr_result <- rcr(
formula = outcome ~ treat | age + educ + married,
data = mydata,
rc_range = c(0, 10)
)
summary(rcr_result)
plot(rcr_result)Coefficient Stability (Oster’s delta)
Oster (2019) proposes testing robustness by examining how coefficients change as controls are added.
Key Metrics
- Bias-adjusted coefficient beta_star(delta, Rmax^2)
- Breakdown point delta_star: Value of delta at which beta_star = 0
Interpretation:
- delta_star > 1: Unobservables would need to be more selective than observables to eliminate the effect
- delta_star < 1: Results fragile to hidden confounding
Implementation
library(robomit, quietly = TRUE)
# Compute delta*
delta_result <- o_delta(
y = "outcome",
x = "treat",
con = "age + educ + married + re74 + re75",
beta = 0,
R2max = 1.3 * summary(lm_model)$r.squared,
type = "lm",
data = mydata
)
# Visualize
o_delta_rsq_viz(
y = "outcome", x = "treat",
con = "age + educ + married + re74 + re75",
beta = 0,
data = mydata
)Falsification Tests
Placebo Outcomes
Test whether the treatment “affects” outcomes it should not affect (e.g., pre-treatment variables, irrelevant outcomes).
# Match as usual
m.out <- matchit(treat ~ covariates, data = mydata)
# Test on placebo outcome
placebo_model <- lm(pre_treatment_outcome ~ treat,
data = match.data(m.out),
weights = weights)
# Should find no effect
summary(placebo_model)Permutation Tests
Randomly reassign treatment status and re-estimate the effect. The true effect should be an outlier in the permutation distribution.
Untestable Limitations
Some assumptions cannot be tested empirically:
- Unconfoundedness: We can never prove all confounders are observed
- SUTVA: Interference effects are hard to detect directly
- Functional form: Matching is somewhat robust, but outcome models still require assumptions
Strategies:
- Domain knowledge: Use theory to justify covariate selection
- Sensitivity analyses: Quantify how robust conclusions are
- Robustness checks: Try multiple matching methods and specifications
- Transparency: Report all decisions and diagnostics