Estimation Module (estimation)

The estimation module implements the core regression and inference procedures for the Lee and Wooldridge difference-in-differences methods.

Cross-sectional OLS estimation for difference-in-differences analysis.

This module implements OLS regression on transformed outcome variables for estimating average treatment effects on the treated (ATT) in common timing difference-in-differences designs. After unit-specific time-series transformations remove pre-treatment patterns, the estimation problem reduces to standard cross-sectional regression.

The module provides three main functions:

prepare_controls : Constructs centered controls and interaction terms for regression adjustment.
estimate_att : Estimates the ATT using cross-sectional OLS on the first post-treatment period.
estimate_period_effects : Estimates period-specific ATTs via independent cross-sectional regressions for each post-treatment period.

Supported variance-covariance estimators include homoskedastic OLS, heteroskedasticity-robust (HC1, HC3), and cluster-robust standard errors.

Notes

The transformation-based approach converts the panel data DiD problem into a cross-sectional treatment effects problem. Under no anticipation and parallel trends assumptions, the ATT is identified as the coefficient on the treatment indicator in OLS regression of the transformed outcome.

For small samples, homoskedastic standard errors with t-distribution critical values provide exact inference under normality. HC3 standard errors offer improved finite-sample performance over HC1 when sample sizes are moderate but asymptotic theory may not apply.

Cluster-robust inference uses G-1 degrees of freedom (where G is the number of clusters) rather than the residual degrees of freedom, which provides more conservative inference when the number of clusters is small.

lwdid.estimation.prepare_controls(data, d, ivar, controls, N_treated, N_control, data_sample=None)[source]

Construct centered controls and interactions for regression adjustment.

Verifies that sample sizes satisfy N_treated > K+1 and N_control > K+1, where K is the number of control variables. If conditions are met, computes the mean of control variables for the treated group, centers the controls by subtracting these means, and creates interaction terms between the treatment indicator and the centered controls.

Parameters:

data (pd.DataFrame) – Transformed panel data containing control variables and treatment indicators.
d (str) – Name of the treatment indicator column.
ivar (str) – Name of the unit identifier column.
controls (list[str]) – List of control variable names. Must be numeric and time-invariant.
N_treated (int) – Number of treated units in the regression sample.
N_control (int) – Number of control units in the regression sample.
data_sample (pd.DataFrame | None, optional) – Specific sample to use for computing means (e.g., first-post-treatment cross-section). If None, uses the full data filtered by treatment status.

Returns:

A dictionary containing:

’include’bool
Whether controls can be included in regression.
’X_centered’pd.DataFrame or None
Centered control variables (X - X_mean_treated).
’interactions’pd.DataFrame or None
Interaction terms (D * X_centered).
’X_mean_treated’dict
Means of controls for the treated group.
’RHS_varnames’list of str
Names of all right-hand side control variables.

Return type:

dict[str, Any]

Overview

This module provides two main estimation functions:

estimate_att: Estimates the average treatment effect on the treated (ATT)
estimate_period_effects: Estimates period-specific treatment effects

Both functions run OLS regressions on transformed data and compute standard errors using various variance estimators.

Estimation Functions

estimate_att()

Purpose: Estimate the overall average treatment effect on the treated (ATT) from the cross-sectional representation of the Lee and Wooldridge estimator.

Regression specification (conceptual):

\[Y_i = \alpha + \tau D_i + Z_i'\beta + \varepsilon_i\]

where:

y_i: Transformed outcome for unit i (typically the post-treatment average of the residualized outcome constructed by the transformation module)
D_i: Treatment indicator (1 = treated, 0 = control)
Z_i: Optional time-invariant controls and their interactions constructed following Lee and Wooldridge (2026)
ε_i: Regression error term

Estimand: \(\tau\) is the ATT.

Returns:

ATT estimate
Standard error
t-statistic
p-value
Confidence interval
Degrees of freedom

estimate_period_effects()

Purpose: Estimate treatment effects separately for each post-treatment period using cross-sectional regressions.

Regression specification (for each post-treatment period t):

\[Y_{it} = \alpha_t + \tau_t D_i + Z_i'\beta_t + \varepsilon_{it}\]

where:

y_it: Transformed outcome for unit i in period t
D_i: Treatment indicator (1 = treated, 0 = control)
Z_i: Optional time-invariant controls (and their interactions) re-used from the main regression
\(\tau_t\): Treatment effect in period \(t\) (period-specific ATT)

Returns: DataFrame with period-specific estimates, standard errors, t-statistics, p-values, and confidence intervals.

Variance Estimators

The module supports multiple variance estimators for different assumptions about the error structure.

OLS (Homoskedastic)

Assumption: Errors are homoskedastic and normally distributed.

Formula:

\[\text{Var}(\hat{\beta}) = \hat{\sigma}^2 (X'X)^{-1}\]

where \(\hat{\sigma}^2 = RSS / (n - k)\).

Degrees of freedom:

Non-clustered: \(df = n - k\)

When to use: When homoskedasticity and normality are plausible and exact t-based inference is desired.

HC0 (White’s Original Estimator)

Assumption: Errors may be heteroskedastic.

Formula:

\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i x_i x_i' \hat{\varepsilon}_i^2\right) (X'X)^{-1}\]

Degrees of freedom: Same as OLS.

When to use: Large samples with suspected heteroskedasticity. This is the original heteroskedasticity-consistent estimator without finite-sample corrections.

HC1 (Heteroskedasticity-Robust)

Assumption: Errors may be heteroskedastic.

Formula:

\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i x_i x_i' \hat{\varepsilon}_i^2\right) (X'X)^{-1} \times \frac{n}{n-k}\]

Degrees of freedom: Same as OLS.

When to use: Medium to large samples with suspected heteroskedasticity. HC1 applies a degrees-of-freedom correction to HC0.

HC2 (Leverage-Adjusted)

Assumption: Errors may be heteroskedastic.

Formula:

\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i \frac{x_i x_i' \hat{\varepsilon}_i^2}{1-h_{ii}}\right) (X'X)^{-1}\]

where \(h_{ii}\) is the i-th diagonal element of the hat matrix \(H = X(X'X)^{-1}X'\).

Degrees of freedom: Same as OLS.

When to use: Small to moderate samples with suspected heteroskedasticity and varying leverage across observations.

HC3 (Small-Sample Adjusted)

Assumption: Errors may be heteroskedastic.

Formula:

\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i \frac{x_i x_i' \hat{\varepsilon}_i^2}{(1-h_{ii})^2}\right) (X'X)^{-1}\]

where \(h_{ii}\) is the i-th diagonal element of the hat matrix.

Degrees of freedom: Same as OLS.

When to use: Small or moderate samples with suspected heteroskedasticity. Simulation evidence suggests HC3 can perform reasonably well in some small-sample designs, but results can still be sensitive when the number of treated or control units is very small. See Methodological Notes for further discussion.

HC4 (High-Leverage Adjusted)

Assumption: Errors may be heteroskedastic.

Formula:

\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i \frac{x_i x_i' \hat{\varepsilon}_i^2}{(1-h_{ii})^{\delta_i}}\right) (X'X)^{-1}\]

where \(\delta_i = \min(4, n h_{ii}/\sum_j h_{jj})\) is an adaptive exponent based on leverage.

Degrees of freedom: Same as OLS.

When to use: When data contains high-leverage observations. HC4 provides adaptive adjustment that increases for observations with high leverage.

Variance Estimator Selection Guide

OLS: Sample size: Any. Assumes homoskedasticity and normality. Use for exact t-inference under CLM assumptions.
HC0: Sample size: Large (N > 100). Assumes heteroskedasticity. Use for large samples only; understates SE in small samples.
HC1: Sample size: Moderate-Large (N > 50). Assumes heteroskedasticity. General robust SE; df-corrected HC0.
HC2: Sample size: Small-Moderate (N = 20-100). Assumes heteroskedasticity with varying leverage. Use when leverage varies across observations.
HC3: Sample size: Small-Moderate (N = 10-50). Assumes heteroskedasticity. Default for small samples with suspected heteroskedasticity.
HC4: Sample size: Any. Assumes heteroskedasticity with influential points. Use when high-leverage points are present in data.
Cluster: Sample size: G > 30 clusters. Assumes within-cluster correlation. Use for clustered data (states, schools).

Selection guidelines:

CLM assumptions plausible + exact inference needed: Use OLS (vce=None)
Heteroskedasticity suspected + small sample: Use HC3 (vce='hc3')
Heteroskedasticity suspected + moderate/large sample: Use HC1 (vce='hc1')
High-leverage observations present: Use HC4 (vce='hc4')
Clustered errors: Use cluster-robust (vce='cluster')

Cluster-Robust

Assumption: Errors are correlated within clusters but independent across clusters.

Formula:

\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_g X_g' \hat{\varepsilon}_g \hat{\varepsilon}_g' X_g\right) (X'X)^{-1}\]

where \(g\) indexes clusters.

Degrees of freedom: G - 1 (number of clusters minus 1).

When to use: Errors are clustered (e.g., students within schools).

Implementation Details

Regression Procedure

Prepare design matrix: For the main regression, construct a cross-sectional design matrix with an intercept, the treatment indicator, and (when applicable) time-invariant controls and their interactions with treatment as in Lee and Wooldridge (2026). The same control specification is reused for period-by-period regressions.
Run OLS: Compute \(\hat{\beta} = (X'X)^{-1} X'y\)
Compute residuals: \(\hat{\varepsilon} = y - X\hat{\beta}\)
Compute variance: Use appropriate variance estimator
Inference: Compute t-statistics and p-values using t-distribution

Degrees of Freedom

Non-clustered:

df = n - k

where:

n: Number of observations
k: Number of parameters (intercept + regressors)

Clustered:

df = G - 1

where G is the number of clusters.

Rationale: With cluster-robust SEs, the effective sample size is the number of clusters, not observations.

Confidence Intervals

95% confidence intervals are computed as:

\[CI = \hat{\beta} \pm t_{\alpha/2, df} \times SE(\hat{\beta})\]

where \(t_{\alpha/2, df}\) is the critical value from the t-distribution with \(df\) degrees of freedom.

Technical Notes

Numerical Stability

The module relies on the numerically stable OLS implementation in statsmodels:

OLS estimation and variance–covariance computation are delegated to statsmodels
Robust variance estimators with small-sample adjustments (HC0-HC4)
Singular or ill-conditioned designs raise errors or warnings rather than failing silently

Missing Data

Observations with missing values in required variables (outcome, treatment indicator, unit identifier, time variables, post indicator) are dropped during validation.
For control variables, missing values are handled at the estimation stage: if dropping observations with missing controls still leaves enough treated and control units to satisfy the \(N_1 > K+1\) and \(N_0 > K+1\) conditions, those observations are removed and controls are included; otherwise, controls are omitted and the full regression sample is retained. In both cases, informative warnings are issued.
The effective sample size (n) reported in the results corresponds to the cross-sectional regression sample used for ATT estimation (the firstpost cross-section).

Perfect Collinearity

Regressors must not be perfectly collinear for the OLS problem to be identified. Common sources of exact collinearity include:

Including both a variable and its exact linear transformation
Including dummy variables for all categories (no omitted category)
Including controls that are exact linear combinations of other regressors

Example Usage

These functions are used internally by lwdid.lwdid() after the transformation step has constructed the transformed outcomes and main regression sample. They are not part of the typical user-facing API. For most applications, lwdid.lwdid() should be called directly, relying on its high-level interface. Advanced users who need low-level access can consult the docstrings and source code in lwdid.estimation to see the exact function signatures and required inputs.

Large-Sample Inference

For large cross-sectional samples, asymptotic inference using robust standard errors is appropriate:

HC0-HC4 Standard Errors

HC1 provides heteroskedasticity-consistent standard errors with a degrees-of- freedom correction (n/(n-k))
HC3 adds leverage-based adjustments suitable for smaller samples

Cluster-Robust Standard Errors

When errors are correlated within clusters (e.g., states, regions), cluster- robust standard errors account for this correlation. Inference relies on asymptotic approximations that improve with the number of clusters G.

Variance Estimator Recommendation

Small samples with CLM assumptions: Use vce=None for exact t-based inference
Small to moderate samples with heteroskedasticity: Use vce='hc3'
Moderate to large samples: Use vce='hc1' or vce='robust'
Clustered data with many clusters: Use vce='cluster'

Estimator Selection for Large Samples

Lee and Wooldridge (2025) establishes theoretical and simulation evidence for choosing among estimators when sample sizes are large enough for asymptotic inference. The key Monte Carlo simulation results inform the following recommendations.

Available Estimators

RA (Regression Adjustment): Default estimator. Under correct specification of the outcome model, RA is both best linear unbiased (BLUE) and asymptotically efficient. Equivalent to POLS/ETWFE in the common timing case.
IPW (Inverse Probability Weighting): Consistent when the propensity score model is correctly specified. May be less efficient than RA under correct outcome model specification.
IPWRA (Doubly Robust): Combines regression adjustment with propensity score weighting. Consistent if either the outcome model or propensity score model is correctly specified (double robustness property).

Doubly Robust ATT Estimator (Mathematical Formulation)

The influence function representation of the doubly robust ATT estimator is:

\[\psi_i^{DR} = \frac{D_i}{\pi} \left[ Y_i - m_0(X_i) \right] - \frac{(1-D_i) p(X_i)}{\pi (1-p(X_i))} \left[ Y_i - m_0(X_i) \right] - \tau_{ATT}\]

where \(\pi = P(D=1)\) is the unconditional treatment probability, \(p(X_i) = P(D=1|X_i)\) is the propensity score, and \(m_0(X_i) = E[Y|D=0, X_i]\) is the conditional mean for controls.

Double robustness property:

If \(m_0(X)\) is correctly specified: ATT is identified via regression adjustment regardless of propensity score specification
If \(p(X)\) is correctly specified: IPW component corrects for outcome model misspecification
Both correct: achieves semiparametric efficiency bound
PSM (Propensity Score Matching): Matches treated to control units based on estimated propensity scores. Generally less efficient than RA and IPWRA.

Efficiency Comparison (from Lee and Wooldridge 2025)

Under correct specification of all models:

RA/POLS has the smallest standard deviation and RMSE
IPWRA performs close to RA (within 3-5% higher SD in most scenarios)
PSM and long differencing methods have notably larger standard deviations (25-40% higher)

Under model misspecification:

When outcome model is misspecified but propensity score is correct: IPWRA has smallest RMSE due to its double robustness
When both models are misspecified: IPWRA still tends to have smaller bias than RA while maintaining reasonable precision

Estimator Selection Guidelines

Start with RA when N is moderate and functional form assumptions are plausible. RA is efficient and computationally simple.
Use IPWRA as primary estimator when:
- Functional form assumptions are uncertain
- Robustness to model misspecification is desired
- Controls are available for both outcome and propensity score models
Use IPW/PSM when:
- Propensity score weighting or matching is preferred for substantive reasons
- Comparison with other treatment effects literature is desired
Report multiple estimators for robustness: If RA and IPWRA give similar results, this provides evidence that findings are not sensitive to functional form assumptions.

Relationship to Long Differencing Approaches

Long differencing approaches use only the period just prior to intervention, discarding information from earlier pre-treatment periods. Lee and Wooldridge (2025) shows that this can result in substantial efficiency loss. The rolling transformation approach uses all suitable pre-treatment periods, achieving efficiency close to POLS while permitting application of doubly robust estimators.

Estimation Module (estimation)

Overview

Estimation Functions

estimate_att()

estimate_period_effects()

Variance Estimators

OLS (Homoskedastic)

HC0 (White’s Original Estimator)

HC1 (Heteroskedasticity-Robust)

HC2 (Leverage-Adjusted)

HC3 (Small-Sample Adjusted)

HC4 (High-Leverage Adjusted)

Variance Estimator Selection Guide

Cluster-Robust

Implementation Details

Regression Procedure

Degrees of Freedom

Confidence Intervals

Technical Notes

Numerical Stability

Missing Data

Perfect Collinearity

Example Usage

Large-Sample Inference

Estimator Selection for Large Samples

See Also