Matching Methods

Entropy Balancing

Introduction

Entropy balancing is a preprocessing method that reweights control observations to achieve exact covariate balance on specified moments (means, variances, skewness) while staying as close as possible to uniform weights (Hainmueller 2012).

Unlike propensity score weighting, which estimates weights indirectly through a propensity score model, entropy balancing directly solves for weights that satisfy user-specified balance constraints. This approach:

  • Achieves exact balance on specified moments by construction
  • Requires no iterative model specification searching for balance
  • Produces stable weights through maximum entropy optimization
  • Combines advantages of matching and weighting methods

The Core Insight

Traditional matching methods involve two steps:

  1. Estimate propensity scores (or distances)
  2. Check if resulting matches achieve balance

Entropy balancing reverses this logic:

  1. Specify desired balance constraints directly
  2. Solve for weights that satisfy those constraints

This guarantees balance on chosen moments without iterative model tweaking.

Mathematical Formulation

The Optimization Problem

Entropy balancing finds weights \(w_i\) for control observations (\(T_i = 0\)) that minimize:

\[ H(w) = \sum_{i: T_i=0} h(w_i) \]

subject to balance and normalization constraints.

Objective Function

The objective uses a distance metric (typically Shannon entropy):

\[ h(w_i) = w_i \log(w_i / q_i) \]

where \(q_i\) is a base weight (typically \(q_i = 1\) for uniform base weights).

This is equivalent to minimizing the Kullback-Leibler divergence from the base weights, ensuring weights are as uniform as possible while satisfying constraints.

Balance Constraints

The key constraints enforce exact balance on covariate moments:

Moment balance constraints: \[ \sum_{i: T_i=0} w_i c_{ri}(\mathbf{X}_i) = \sum_{j: T_j=1} c_{rj}(\mathbf{X}_j) \quad \text{for } r = 1, \ldots, R \]

where \(c_r(\mathbf{X})\) are covariate moment functions.

Normalization Constraint

Weights must sum to the number of controls:

\[ \sum_{i: T_i=0} w_i = N_0 \]

where \(N_0\) is the number of control observations.

Common Balance Targets

First Moment (Means)

For covariate \(X_k\): \[ \sum_{i: T_i=0} w_i X_{ik} = \sum_{j: T_j=1} X_{jk} \]

This ensures weighted control mean equals treated mean.

Second Moment (Variances)

\[ \sum_{i: T_i=0} w_i X_{ik}^2 = \sum_{j: T_j=1} X_{jk}^2 \]

Balances not just means but also variances.

Higher Moments and Interactions

Can include: - Skewness: \(E[X^3]\) - Interactions: \(E[X_j \cdot X_k]\) - Polynomials: \(E[X^2], E[X^3]\)

ATT vs ATE

ATT weights (default): Balance controls to match treated group - Treated units get weight 1 - Only control units are reweighted

ATE weights: Balance both groups to population - Requires additional constraints on treated units - Less commonly used in practice

Statistical Workflow

Step 1: Specify Covariates and Moments

Decide which variables and moments to balance:

# R example
balance_vars <- c("age", "educ", "race", "married", "nodegree", "re74", "re75")
moments <- list(c(1, 2))  # Balance means and variances

Step 2: Fit Entropy Balancing

Solve the optimization problem:

library(ebal, quietly = TRUE)
eb_out <- ebalance(Treatment = df$treat,
                   X = df[, balance_vars])

Step 3: Extract Weights

# Control weights
df$weight[df$treat == 0] <- eb_out$w

# Treated weights (always 1 for ATT)
df$weight[df$treat == 1] <- 1

Step 4: Check Balance

Verify exact balance achieved:

library(cobalt, quietly = TRUE)
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
        data = df,
        weights = "weight",
        method = "weighting")

Step 5: Estimate Treatment Effect

# Weighted regression
effect_model <- lm(re78 ~ treat, 
                   data = df, 
                   weights = weight)
summary(effect_model)

Estimation

library(tidyverse, quietly = TRUE)
library(ebal, quietly = TRUE)
library(cobalt, quietly = TRUE)
# Download the lalonde data 
data <- MatchIt::lalonde

# Prepare covariate matrix (use model.matrix to handle factor variables)
X <- model.matrix(
    ~ age + educ + race + married + nodegree + re74 + re75,
    data = data
)[, -1]  # Remove intercept column

T_vec <- data$treat

# Fit entropy balancing
eb_fit <- ebalance(Treatment = T_vec, X = X)
Converged within tolerance 
# Examine balance improvement
bal.tab(eb_fit, treat = T_vec, covs = X)   # balance table
Balance Measures
              Type Diff.Adj
age        Contin.       -0
educ       Contin.       -0
racehispan  Binary        0
racewhite   Binary       -0
married     Binary       -0
nodegree    Binary        0
re74       Contin.       -0
re75       Contin.       -0

Effective sample sizes
           Control Treated
Unadjusted  429.       185
Adjusted     98.46     185
love.plot(eb_fit, treat = T_vec, covs = X) # Love plot of standardized mean differences
Warning: Standardized mean differences and raw mean differences are present in the same
plot. Use the `stars` argument to distinguish between them and appropriately
label the x-axis. See `love.plot()` for details.

# Extract weights
data$weight <- 1  # Initialize all weights to 1
data$weight[data$treat == 0] <- eb_fit$w

# Verify balance
bal_check <- bal.tab(
    treat ~ age + educ + race + married + nodegree + re74 + re75,
    data = data,
    weights = "weight",
    method = "weighting",
    stats = c("m", "v"),
    thresholds = c(m = 0.05, v = 2)
)

print(bal_check)
Balance Measures
               Type Diff.Adj     M.Threshold V.Ratio.Adj      V.Threshold
age         Contin.       -0 Balanced, <0.05      0.4096 Not Balanced, >2
educ        Contin.       -0 Balanced, <0.05      0.6635     Balanced, <2
race_black   Binary        0 Balanced, <0.05           .                 
race_hispan  Binary        0 Balanced, <0.05           .                 
race_white   Binary       -0 Balanced, <0.05           .                 
married      Binary       -0 Balanced, <0.05           .                 
nodegree     Binary        0 Balanced, <0.05           .                 
re74        Contin.       -0 Balanced, <0.05      1.3265     Balanced, <2
re75        Contin.       -0 Balanced, <0.05      1.3351     Balanced, <2

Balance tally for mean differences
                    count
Balanced, <0.05         9
Not Balanced, >0.05     0

Variable with the greatest mean difference
 Variable Diff.Adj     M.Threshold
     re74       -0 Balanced, <0.05

Balance tally for variance ratios
                 count
Balanced, <2         3
Not Balanced, >2     1

Variable with the greatest variance ratio
 Variable V.Ratio.Adj      V.Threshold
      age      0.4096 Not Balanced, >2

Effective sample sizes
           Control Treated
Unadjusted  429.       185
Adjusted     98.46     185
# Visualize balance
love.plot(bal_check, 
        threshold = 0.1,
        abs = TRUE,
        var.order = "unadjusted",
        colors = c("red", "blue"))
Warning: Unadjusted values are missing. This can occur when `un = FALSE` and `quick =
TRUE` in the original call to `bal.tab()`.
Warning: `var.order` was set to "unadjusted", but no unadjusted mean differences were
calculated. Ignoring `var.order`.
Warning: Standardized mean differences and raw mean differences are present in the same
plot. Use the `stars` argument to distinguish between them and appropriately
label the x-axis. See `love.plot()` for details.

# Check weight distribution
summary(data$weight[data$treat == 0])
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.008086 0.039651 0.078746 0.431236 0.228944 4.062430 
ggplot(data %>% filter(treat == 0), 
        aes(x = weight)) +
    geom_histogram(bins = 30, fill = "steelblue") +
    labs(title = "Distribution of Entropy Balancing Weights",
        x = "Weight", y = "Count") +
    theme_minimal()

# Estimate treatment effect
att_model <- lm(re78 ~ treat, 
                data = data, 
                weights = weight)

# Robust standard errors
library(sandwich, quietly = TRUE)
library(lmtest, quietly = TRUE)
coeftest(att_model, vcov = vcovHC(att_model, type = "HC3"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5075.88     596.35  8.5116   <2e-16 ***
treat        1273.26     831.88  1.5306   0.1264    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Alternative: Use WeightIt package
# library(WeightIt)
# W <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
#             data = data,
#             method = "ebal",
#             estimand = "ATT")

# summary(W)
# bal.tab(W, stats = c("m", "v"), thresholds = c(m = 0.05))

# Estimate with marginaleffects
# library(marginaleffects)
# fit <- lm(re78 ~ treat, 
#             data = data, 
#             weights = W$weights)

# avg_comparisons(fit, variables = "treat",
#                 vcov = "HC3")
import pandas as pd
import numpy as np
from scipy.optimize import minimize
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Load data
data = pd.read_csv('lalonde.csv')

def entropy_balancing(X_control, X_treated, max_iter=1000):
    """
    Compute entropy balancing weights for control group
    
    Parameters:
    -----------
    X_control : array-like, shape (n_control, p)
        Covariate matrix for control group
    X_treated : array-like, shape (n_treated, p)
        Covariate matrix for treated group
    
    Returns:
    --------
    weights : array, shape (n_control,)
        Entropy balancing weights for controls
    """
    n_control = X_control.shape[0]
    p = X_control.shape[1]
    
    # Target moments (treated group means)
    target_means = X_treated.mean(axis=0)
    
    # Objective: minimize entropy
    def objective(w):
        # Shannon entropy
        return np.sum(w * np.log(w / (1.0)))
    
    # Constraint: moment balance
    def balance_constraint(w):
        weighted_means = (w[:, None] * X_control).sum(axis=0) / w.sum()
        return weighted_means - target_means
    
    # Constraint: weights sum to n_control
    def sum_constraint(w):
        return w.sum() - n_control
    
    # Initial weights (uniform)
    w0 = np.ones(n_control)
    
    # Constraints
    constraints = [
        {'type': 'eq', 'fun': sum_constraint},
        {'type': 'eq', 'fun': balance_constraint}
    ]
    
    # Bounds: weights must be positive
    bounds = [(0.001, None) for _ in range(n_control)]
    
    # Optimize
    result = minimize(objective, w0, 
                    method='SLSQP',
                    bounds=bounds,
                    constraints=constraints,
                    options={'maxiter': max_iter})
    
    return result.x

# Load data
data = pd.read_csv('data.csv')

# Prepare covariates
covariates = ['age', 'educ', 'race', 'married', 'nodegree', 're74', 're75']
X_control = data.loc[data['treat'] == 0, covariates].values
X_treated = data.loc[data['treat'] == 1, covariates].values

# Compute entropy balancing weights
eb_weights = entropy_balancing(X_control, X_treated)

# Assign weights to data
data['weight'] = 1.0  # Treated units get weight 1
data.loc[data['treat'] == 0, 'weight'] = eb_weights

# Check balance
def check_balance(data, covariates, treatment_col, weight_col):
    """Calculate weighted standardized mean differences"""
    results = []
    for var in covariates:
        treated = data[data[treatment_col] == 1]
        control = data[data[treatment_col] == 0]
        
        # Weighted means
        mean_t = treated[var].mean()
        mean_c = np.average(control[var], weights=control[weight_col])
        
        # Pooled standard deviation (unweighted for reference)
        pooled_sd = np.sqrt((treated[var].var() + control[var].var()) / 2)
        
        smd = (mean_t - mean_c) / pooled_sd
        results.append({'Variable': var, 'SMD': smd})
    
    return pd.DataFrame(results)

balance_table = check_balance(data, covariates, 'treat', 'weight')
print("Balance after entropy balancing:")
print(balance_table)

# Estimate treatment effect
outcome_model = smf.wls('re78 ~ treat', 
                        data=data, 
                        weights=data['weight']).fit()

print("\nTreatment Effect Estimate:")
print(outcome_model.summary())

# Robust standard errors
print("\nRobust standard errors:")
print(outcome_model.get_robustcov_results(cov_type='HC3').summary())

# Weight diagnostics
print("\nWeight Distribution (Controls):")
print(data.loc[data['treat'] == 0, 'weight'].describe())

* Load data 
import delimited "lalonde.csv", clear

* Install ebalance package if needed
* ssc install ebalance

* Fit entropy balancing
* Balance on means (first moment) only
ebalance treat age educ i.race married i.nodegree re74 re75, ///
    targets(1) ///
    generate(eb_weight)

* Check balance
ebalance summarize

* Visualize balance
ebalance plot treat age educ re74 re75, ///
    weights(eb_weight)

* Alternative: Include second moments (variances)
ebalance treat age educ i.race married i.nodegree re74 re75, ///
    targets(1 2) ///
    generate(eb_weight2)

* Compare balance
tebalance summarize age educ re74 re75, ///
    baseline ///
    weight(eb_weight)

* Estimate treatment effect
reg re78 treat [aweight=eb_weight], robust

* Store results
estimates store eb_att

* Compare to unweighted
reg re78 treat, robust
estimates store unweighted

* Compare estimates
estimates table eb_att unweighted, ///
    b(%7.3f) se(%7.3f) ///
    stats(N r2)

* Check weight distribution
summarize eb_weight if treat == 0, detail

histogram eb_weight if treat == 0, ///
    title("Entropy Balancing Weights (Controls)") ///
    xtitle("Weight") ///
    ytitle("Frequency")

Advantages and Limitations

Advantages

  • Exact balance - Achieves perfect balance on specified moments by construction
  • No model searching - Eliminates iterative propensity score specification
  • Stable weights - Maximum entropy ensures smooth weight distribution
  • Transparent - Clear specification of balance requirements
  • Flexible - Can balance means, variances, skewness, interactions
  • Efficiency - Retains all observations (like IPW)
  • Intuitive - Direct optimization of balance constraints

Limitations

  • Balance target choice - User must specify which moments to balance
  • No overlap enforcement - Doesn’t automatically address positivity violations
  • Computational intensity - Optimization can be slow for large datasets
  • Perfect balance risk - May overfit to sample noise rather than population
  • Linear combinations only - Balances specified moments, not all possible functions
  • Extrapolation - Can produce large weights if groups are very different

Comparison to Other Methods

Entropy Balancing vs IPW

Feature Entropy Balancing IPW
Balance Exact on chosen moments Approximate
Propensity model Not required Required
Iteration One-step May need respecification
Moment control User specifies Indirect
Weights Optimized for balance Based on PS model
Transparency High Moderate

Entropy Balancing vs Matching

Feature Entropy Balancing Matching
Sample size Retains all data May discard
Balance Exact on moments Approximate
Implementation Optimization Algorithm
Weights Continuous Binary (1:1) or discrete
Estimand ATT primarily ATT primarily

When to Use Entropy Balancing

Ideal scenarios:

  • You know which moments need balancing (means, variances)
  • Sample size is moderate to large
  • You want guaranteed exact balance on key variables
  • Propensity score modeling is difficult or controversial

Use alternatives when:

  • Very small sample sizes (matching may be better)
  • Extreme covariate imbalance (may produce extreme weights)
  • Need to enforce common support explicitly (use trimming)

Theoretical Properties

Asymptotic Properties

Under standard regularity conditions (Hainmueller 2012):

  1. Consistency: \(\hat{\tau}^{ATT} \xrightarrow{p} \tau^{ATT}\)
  2. Asymptotic normality: \(\sqrt{n}(\hat{\tau}^{ATT} - \tau^{ATT}) \xrightarrow{d} N(0, V)\)

Double Robustness

Entropy balancing combined with covariate adjustment in the outcome model provides double robustness:

\[ E[Y_i|T_i, X_i] = \alpha + \tau T_i + X_i^\top \gamma \]

The estimate is consistent if either: - The entropy balancing achieves correct balance, OR - The outcome model is correctly specified

Practical Considerations

Choosing Moments to Balance

First moments (means) - Always include:

ebal(..., moments = 1)

Second moments (variances) - Add for distributional similarity:

ebal(..., moments = c(1, 2))

Higher moments - Use sparingly to avoid overfitting:

ebal(..., moments = c(1, 2, 3))

Handling Extreme Weights

If weights become very large:

  1. Check overlap: Visualize propensity score distributions
  2. Trim sample: Remove units with extreme covariate values
  3. Relax constraints: Balance fewer moments
  4. Use covariate adjustment: Include covariates in outcome model
# Check for extreme weights
quantile(eb_weight, c(0.95, 0.99, 1.0))

# Flag extreme weights
extreme <- eb_weight > quantile(eb_weight, 0.99)
table(extreme)

Diagnostics

Balance Checks

# Standardized mean differences should be ~0
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75, 
        data = df,
        weights = "weight",
        stats = "m")

# Variance ratios should be ~1
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
        data = df, 
        weights = "weight",
        stats = "v")

Weight Distribution

# Summary statistics
summary(df$weight[df$treat == 0])

# Effective sample size
sum(df$weight[df$treat == 0])^2 / 
  sum(df$weight[df$treat == 0]^2)

Extensions

Multivariate Balancing

Balance on interactions and polynomials:

# Create interaction terms
df$age_edu <- df$age * df$education

# Include in balancing
ebalance(treatment, 
         X = cbind(age, education, age_edu))

Stable Balancing Weights

Extension by Zhao and Percival (2017) adds stability constraints:

\[ \min H(w) \quad \text{s.t. balance and } \max(w_i) \leq M \]

Kernel Balancing

Hazlett (2020) proposes kernel-based approach for nonlinear balance.

References

Key papers on entropy balancing:

  • Hainmueller (2012) - Original entropy balancing method
  • Zhao and Percival (2017) - Stable balancing weights extensions
  • Hazlett (2020) - Kernel balancing for nonparametric balance
  • Vegetabile et al. (2021) - Nonparametric preprocessing methods

References

Hainmueller, Jens. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20 (1): 25–46. https://doi.org/10.1093/pan/mpr025.
Hazlett, Chad. 2020. “Kernel Balancing: A Flexible Non-Parametric Weighting Procedure for Estimating Causal Effects.” Statistica Sinica 30 (3): 1155–89. https://doi.org/10.5705/ss.202018.0122.
Vegetabile, Brian G., Beth Ann Griffin, Donna L. Coffman, Matthew Cefalu, Michael W. Robbins, and Daniel F. McCaffrey. 2021. “Nonparametric Estimation of Population Average Dose-Response Curves Using Entropy Balancing Weights for Continuous Exposures.” Health Services and Outcomes Research Methodology 21 (1): 69–110. https://doi.org/10.1007/s10742-020-00236-2.
Zhao, Qingyuan, and Daniel Percival. 2017. “Entropy Balancing Is Doubly Robust.” Journal of Causal Inference 5 (1). https://doi.org/10.1515/jci-2016-0010.