Estimation Module (estimation)
The estimation module implements the core regression and inference procedures for the Lee and Wooldridge difference-in-differences methods.
Cross-sectional OLS estimation for difference-in-differences analysis.
This module implements OLS regression on transformed outcome variables for estimating average treatment effects on the treated (ATT) in common timing difference-in-differences designs. After unit-specific time-series transformations remove pre-treatment patterns, the estimation problem reduces to standard cross-sectional regression.
The module provides three main functions:
prepare_controls: Constructs centered controls and interaction terms for regression adjustment.estimate_att: Estimates the ATT using cross-sectional OLS on the first post-treatment period.estimate_period_effects: Estimates period-specific ATTs via independent cross-sectional regressions for each post-treatment period.
Supported variance-covariance estimators include homoskedastic OLS, heteroskedasticity-robust (HC1, HC3), and cluster-robust standard errors.
Notes
The transformation-based approach converts the panel data DiD problem into a cross-sectional treatment effects problem. Under no anticipation and parallel trends assumptions, the ATT is identified as the coefficient on the treatment indicator in OLS regression of the transformed outcome.
For small samples, homoskedastic standard errors with t-distribution critical values provide exact inference under normality. HC3 standard errors offer improved finite-sample performance over HC1 when sample sizes are moderate but asymptotic theory may not apply.
Cluster-robust inference uses G-1 degrees of freedom (where G is the number of clusters) rather than the residual degrees of freedom, which provides more conservative inference when the number of clusters is small.
- lwdid.estimation.prepare_controls(data, d, ivar, controls, N_treated, N_control, data_sample=None)[source]
Construct centered controls and interactions for regression adjustment.
Verifies that sample sizes satisfy N_treated > K+1 and N_control > K+1, where K is the number of control variables. If conditions are met, computes the mean of control variables for the treated group, centers the controls by subtracting these means, and creates interaction terms between the treatment indicator and the centered controls.
- Parameters:
data (pd.DataFrame) – Transformed panel data containing control variables and treatment indicators.
d (str) – Name of the treatment indicator column.
ivar (str) – Name of the unit identifier column.
controls (list[str]) – List of control variable names. Must be numeric and time-invariant.
N_treated (int) – Number of treated units in the regression sample.
N_control (int) – Number of control units in the regression sample.
data_sample (pd.DataFrame | None, optional) – Specific sample to use for computing means (e.g., first-post-treatment cross-section). If None, uses the full
datafiltered by treatment status.
- Returns:
A dictionary containing:
- ’include’bool
Whether controls can be included in regression.
- ’X_centered’pd.DataFrame or None
Centered control variables (X - X_mean_treated).
- ’interactions’pd.DataFrame or None
Interaction terms (D * X_centered).
- ’X_mean_treated’dict
Means of controls for the treated group.
- ’RHS_varnames’list of str
Names of all right-hand side control variables.
- Return type:
See also
estimate_attUses prepared controls for ATT estimation.
estimate_period_effectsUses prepared controls for period-specific ATTs.
Notes
Centering controls at the treated-group mean ensures that the coefficient on the treatment indicator directly estimates the ATT at the mean covariate values of the treated group. The regression model is:
\[\hat{Y}_i = \alpha + \tau D_i + X_i \beta + D_i (X_i - \bar{X}_1) \delta + U_i\]where \(\bar{X}_1 = N_1^{-1} \sum_{i: D_i=1} X_i\) is the treated-group mean of covariates.
The sample size conditions (N_treated > K+1 and N_control > K+1) ensure that both treated and control subsamples have sufficient degrees of freedom for separate slope estimation.
- lwdid.estimation.estimate_att(data, y_transformed, d, ivar, controls, vce, cluster_var, sample_filter, alpha=0.05)[source]
Estimate Average Treatment Effect on the Treated (ATT) via cross-sectional OLS.
Performs an OLS regression on the specified cross-section (typically the first post-treatment period) using the transformed outcome variable. Supports homoskedastic, heteroskedasticity-robust (HC1, HC3), and cluster-robust standard errors.
- Parameters:
data (pd.DataFrame) – Transformed panel data containing the dependent variable and regressors.
y_transformed (str) – Name of the transformed dependent variable (e.g., ‘ydot_postavg’).
d (str) – Name of the treatment indicator column (typically
d_).ivar (str) – Name of the unit identifier column.
controls (list[str] | None) – List of time-invariant control variables. Controls are included only if sample size conditions (N_1 > K+1 and N_0 > K+1) are met.
vce (str | None) –
Type of variance-covariance estimator:
None : Homoskedastic standard errors (OLS).
’robust’ or ‘hc1’ : Heteroskedasticity-robust (HC1).
’hc3’ : Heteroskedasticity-robust (HC3), better for small samples.
’cluster’ : Cluster-robust. Requires
cluster_var.
cluster_var (str | None) – Name of the variable to use for clustering. Required if
vce='cluster'.sample_filter (pd.Series) – Boolean mask indicating the regression sample (e.g., first post-treatment period).
alpha (float, default=0.05) – Significance level for confidence intervals (0.05 gives 95% CI).
- Returns:
A dictionary containing estimation results:
- ’att’float
Estimated ATT.
- ’se_att’float
Standard error of the ATT.
- ’t_stat’float
t-statistic for H0: ATT = 0.
- ’pvalue’float
Two-sided p-value.
- ’ci_lower’, ‘ci_upper’float
Confidence interval bounds.
- ’params’pd.Series
All regression coefficients.
- ’bse’pd.Series
Standard errors of all coefficients.
- ’vcov’pd.DataFrame
Variance-covariance matrix.
- ’resid’np.ndarray
Residuals.
- ’nobs’int
Number of observations.
- ’df_resid’int
Residual degrees of freedom.
- ’df_inference’int
Degrees of freedom used for inference.
- ’vce_type’str
Type of VCE used.
- ’cluster_var’str or None
Clustering variable if used.
- ’n_clusters’int or None
Number of clusters if clustered.
- ’controls_used’bool
Whether controls were included.
- ’controls’list of str
Controls used in regression.
- ’controls_spec’dict
Details of control preparation.
- ’n_treated_sample’, ‘n_control_sample’int
Sample sizes by treatment status.
- Return type:
- Raises:
InsufficientDataError – If sample size is less than 3, or if no treated or control units exist in the regression sample.
InvalidParameterError – If cluster_var is missing when vce=’cluster’, or if cluster variable is not nested within units.
InvalidVCETypeError – If an unrecognized vce type is specified.
See also
prepare_controlsPrepares centered controls for regression adjustment.
estimate_period_effectsEstimates period-specific ATTs.
Notes
The ATT is estimated as the coefficient on the treatment indicator D in the OLS regression:
\[\hat{Y}_i = \alpha + \tau D_i + X_i \beta + D_i (X_i - \bar{X}_1) \delta + U_i\]Confidence intervals use t-distribution critical values:
\[CI = \hat{\tau} \pm t_{df, 1-\alpha/2} \cdot SE(\hat{\tau})\]Degrees of freedom selection:
For non-clustered standard errors: df = n - k (residual df).
For cluster-robust standard errors: df = G - 1, where G is the number of clusters. This provides more conservative inference when the number of clusters is small.
Under homoskedasticity and normality, the t-statistic has an exact t-distribution, enabling valid inference even with small samples. HC3 is recommended over HC1 for moderate sample sizes as it provides better finite-sample performance.
- lwdid.estimation.estimate_period_effects(data, ydot, d, tindex, tpost1, Tmax, controls_spec, vce, cluster_var, period_labels, alpha=0.05)[source]
Estimate period-specific ATTs via independent cross-sectional regressions.
Iterates through post-treatment periods and estimates the treatment effect for each period using the residualized outcome variable. Each period is estimated via a separate OLS regression, enabling examination of treatment effect dynamics over time.
- Parameters:
data (pd.DataFrame) – Transformed panel data containing the outcome variable.
ydot (str) – Name of the residualized outcome variable.
d (str) – Name of the treatment indicator column.
tindex (str) – Name of the time index column.
tpost1 (int) – First post-treatment period index.
Tmax (int) – Last period index.
controls_spec (dict[str, Any] | None) – Control variable specification dictionary returned by
prepare_controls. If None or if ‘include’ is False, regression includes only the treatment indicator.vce (str | None) – Type of variance-covariance estimator. Same options as
estimate_att: None (homoskedastic), ‘robust’/’hc1’, ‘hc3’, or ‘cluster’.cluster_var (str | None) – Name of the clustering variable. Required if
vce='cluster'.period_labels (dict[int, str]) – Mapping from time index to human-readable period labels for output.
alpha (float, default=0.05) – Significance level for confidence intervals.
- Returns:
A DataFrame with one row per post-treatment period. Columns:
- ’period’str
Human-readable period label.
- ’tindex’int
Time index value.
- ’beta’float
Estimated ATT for the period.
- ’se’float
Standard error.
- ’ci_lower’, ‘ci_upper’float
Confidence interval bounds.
- ’tstat’float
t-statistic for H0: ATT = 0.
- ’pval’float
Two-sided p-value.
- ’N’int
Number of observations in the period.
- Return type:
pd.DataFrame
See also
estimate_attEstimates average ATT across post-treatment periods.
prepare_controlsPrepares control specification for regression.
Notes
Each period is estimated independently via cross-sectional OLS:
\[\dot{Y}_{it} = \alpha_t + \tau_t D_i + X_i \beta_t + D_i (X_i - \bar{X}_1) \delta_t + U_{it}\]where \(\dot{Y}_{it}\) is the transformed outcome and \(\tau_t\) is the period-specific ATT.
The estimators are consistent under no anticipation and parallel trends. Period-specific estimates allow examination of treatment effect dynamics, which is useful for detecting delayed effects or effect decay over time.
Periods with insufficient variation in the treatment indicator (all treated or all control) produce NaN estimates with a warning. This can occur due to panel attrition or unbalanced designs.
Overview
This module provides two main estimation functions:
estimate_att: Estimates the average treatment effect on the treated (ATT)estimate_period_effects: Estimates period-specific treatment effects
Both functions run OLS regressions on transformed data and compute standard errors using various variance estimators.
Estimation Functions
estimate_att()
Purpose: Estimate the overall average treatment effect on the treated (ATT) from the cross-sectional representation of the Lee and Wooldridge estimator.
Regression specification (conceptual):
where:
y_i: Transformed outcome for unit i (typically the post-treatment average of the residualized outcome constructed by the transformation module)
D_i: Treatment indicator (1 = treated, 0 = control)
Z_i: Optional time-invariant controls and their interactions constructed following Lee and Wooldridge (2026)
ε_i: Regression error term
Estimand: \(\tau\) is the ATT.
Returns:
ATT estimate
Standard error
t-statistic
p-value
Confidence interval
Degrees of freedom
estimate_period_effects()
Purpose: Estimate treatment effects separately for each post-treatment period using cross-sectional regressions.
Regression specification (for each post-treatment period t):
where:
y_it: Transformed outcome for unit i in period t
D_i: Treatment indicator (1 = treated, 0 = control)
Z_i: Optional time-invariant controls (and their interactions) re-used from the main regression
\(\tau_t\): Treatment effect in period \(t\) (period-specific ATT)
Returns: DataFrame with period-specific estimates, standard errors, t-statistics, p-values, and confidence intervals.
Variance Estimators
The module supports multiple variance estimators for different assumptions about the error structure.
OLS (Homoskedastic)
Assumption: Errors are homoskedastic and normally distributed.
Formula:
where \(\hat{\sigma}^2 = RSS / (n - k)\).
Degrees of freedom:
Non-clustered: \(df = n - k\)
When to use: When homoskedasticity and normality are plausible and exact t-based inference is desired.
HC0 (White’s Original Estimator)
Assumption: Errors may be heteroskedastic.
Formula:
Degrees of freedom: Same as OLS.
When to use: Large samples with suspected heteroskedasticity. This is the original heteroskedasticity-consistent estimator without finite-sample corrections.
HC1 (Heteroskedasticity-Robust)
Assumption: Errors may be heteroskedastic.
Formula:
Degrees of freedom: Same as OLS.
When to use: Medium to large samples with suspected heteroskedasticity. HC1 applies a degrees-of-freedom correction to HC0.
HC2 (Leverage-Adjusted)
Assumption: Errors may be heteroskedastic.
Formula:
where \(h_{ii}\) is the i-th diagonal element of the hat matrix \(H = X(X'X)^{-1}X'\).
Degrees of freedom: Same as OLS.
When to use: Small to moderate samples with suspected heteroskedasticity and varying leverage across observations.
HC3 (Small-Sample Adjusted)
Assumption: Errors may be heteroskedastic.
Formula:
where \(h_{ii}\) is the i-th diagonal element of the hat matrix.
Degrees of freedom: Same as OLS.
When to use: Small or moderate samples with suspected heteroskedasticity. Simulation evidence suggests HC3 can perform reasonably well in some small-sample designs, but results can still be sensitive when the number of treated or control units is very small. See Methodological Notes for further discussion.
HC4 (High-Leverage Adjusted)
Assumption: Errors may be heteroskedastic.
Formula:
where \(\delta_i = \min(4, n h_{ii}/\sum_j h_{jj})\) is an adaptive exponent based on leverage.
Degrees of freedom: Same as OLS.
When to use: When data contains high-leverage observations. HC4 provides adaptive adjustment that increases for observations with high leverage.
Variance Estimator Selection Guide
OLS: Sample size: Any. Assumes homoskedasticity and normality. Use for exact t-inference under CLM assumptions.
HC0: Sample size: Large (N > 100). Assumes heteroskedasticity. Use for large samples only; understates SE in small samples.
HC1: Sample size: Moderate-Large (N > 50). Assumes heteroskedasticity. General robust SE; df-corrected HC0.
HC2: Sample size: Small-Moderate (N = 20-100). Assumes heteroskedasticity with varying leverage. Use when leverage varies across observations.
HC3: Sample size: Small-Moderate (N = 10-50). Assumes heteroskedasticity. Default for small samples with suspected heteroskedasticity.
HC4: Sample size: Any. Assumes heteroskedasticity with influential points. Use when high-leverage points are present in data.
Cluster: Sample size: G > 30 clusters. Assumes within-cluster correlation. Use for clustered data (states, schools).
Selection guidelines:
CLM assumptions plausible + exact inference needed: Use OLS (
vce=None)Heteroskedasticity suspected + small sample: Use HC3 (
vce='hc3')Heteroskedasticity suspected + moderate/large sample: Use HC1 (
vce='hc1')High-leverage observations present: Use HC4 (
vce='hc4')Clustered errors: Use cluster-robust (
vce='cluster')
Cluster-Robust
Assumption: Errors are correlated within clusters but independent across clusters.
Formula:
where \(g\) indexes clusters.
Degrees of freedom: G - 1 (number of clusters minus 1).
When to use: Errors are clustered (e.g., students within schools).
Implementation Details
Regression Procedure
Prepare design matrix: For the main regression, construct a cross-sectional design matrix with an intercept, the treatment indicator, and (when applicable) time-invariant controls and their interactions with treatment as in Lee and Wooldridge (2026). The same control specification is reused for period-by-period regressions.
Run OLS: Compute \(\hat{\beta} = (X'X)^{-1} X'y\)
Compute residuals: \(\hat{\varepsilon} = y - X\hat{\beta}\)
Compute variance: Use appropriate variance estimator
Inference: Compute t-statistics and p-values using t-distribution
Degrees of Freedom
Non-clustered:
df = n - k
where:
n: Number of observations
k: Number of parameters (intercept + regressors)
Clustered:
df = G - 1
where G is the number of clusters.
Rationale: With cluster-robust SEs, the effective sample size is the number of clusters, not observations.
Confidence Intervals
95% confidence intervals are computed as:
where \(t_{\alpha/2, df}\) is the critical value from the t-distribution with \(df\) degrees of freedom.
Technical Notes
Numerical Stability
The module relies on the numerically stable OLS implementation in statsmodels:
OLS estimation and variance–covariance computation are delegated to
statsmodelsRobust variance estimators with small-sample adjustments (HC0-HC4)
Singular or ill-conditioned designs raise errors or warnings rather than failing silently
Missing Data
Observations with missing values in required variables (outcome, treatment indicator, unit identifier, time variables, post indicator) are dropped during validation.
For control variables, missing values are handled at the estimation stage: if dropping observations with missing controls still leaves enough treated and control units to satisfy the \(N_1 > K+1\) and \(N_0 > K+1\) conditions, those observations are removed and controls are included; otherwise, controls are omitted and the full regression sample is retained. In both cases, informative warnings are issued.
The effective sample size (n) reported in the results corresponds to the cross-sectional regression sample used for ATT estimation (the
firstpostcross-section).
Perfect Collinearity
Regressors must not be perfectly collinear for the OLS problem to be identified. Common sources of exact collinearity include:
Including both a variable and its exact linear transformation
Including dummy variables for all categories (no omitted category)
Including controls that are exact linear combinations of other regressors
Example Usage
These functions are used internally by lwdid.lwdid() after the
transformation step has constructed the transformed outcomes and main
regression sample. They are not part of the typical user-facing API. For
most applications, lwdid.lwdid() should be called directly, relying on
its high-level interface. Advanced users who need low-level access can
consult the docstrings and source code in lwdid.estimation to see the
exact function signatures and required inputs.
Large-Sample Inference
For large cross-sectional samples, asymptotic inference using robust standard errors is appropriate:
HC0-HC4 Standard Errors
HC1 provides heteroskedasticity-consistent standard errors with a degrees-of- freedom correction (n/(n-k))
HC3 adds leverage-based adjustments suitable for smaller samples
Cluster-Robust Standard Errors
When errors are correlated within clusters (e.g., states, regions), cluster- robust standard errors account for this correlation. Inference relies on asymptotic approximations that improve with the number of clusters G.
Variance Estimator Recommendation
Small samples with CLM assumptions: Use
vce=Nonefor exact t-based inferenceSmall to moderate samples with heteroskedasticity: Use
vce='hc3'Moderate to large samples: Use
vce='hc1'orvce='robust'Clustered data with many clusters: Use
vce='cluster'
Estimator Selection for Large Samples
Lee and Wooldridge (2025) establishes theoretical and simulation evidence for choosing among estimators when sample sizes are large enough for asymptotic inference. The key Monte Carlo simulation results inform the following recommendations.
Available Estimators
RA (Regression Adjustment): Default estimator. Under correct specification of the outcome model, RA is both best linear unbiased (BLUE) and asymptotically efficient. Equivalent to POLS/ETWFE in the common timing case.
IPW (Inverse Probability Weighting): Consistent when the propensity score model is correctly specified. May be less efficient than RA under correct outcome model specification.
IPWRA (Doubly Robust): Combines regression adjustment with propensity score weighting. Consistent if either the outcome model or propensity score model is correctly specified (double robustness property).
Doubly Robust ATT Estimator (Mathematical Formulation)
The influence function representation of the doubly robust ATT estimator is:
where \(\pi = P(D=1)\) is the unconditional treatment probability, \(p(X_i) = P(D=1|X_i)\) is the propensity score, and \(m_0(X_i) = E[Y|D=0, X_i]\) is the conditional mean for controls.
Double robustness property:
If \(m_0(X)\) is correctly specified: ATT is identified via regression adjustment regardless of propensity score specification
If \(p(X)\) is correctly specified: IPW component corrects for outcome model misspecification
Both correct: achieves semiparametric efficiency bound
PSM (Propensity Score Matching): Matches treated to control units based on estimated propensity scores. Generally less efficient than RA and IPWRA.
Efficiency Comparison (from Lee and Wooldridge 2025)
Under correct specification of all models:
RA/POLS has the smallest standard deviation and RMSE
IPWRA performs close to RA (within 3-5% higher SD in most scenarios)
PSM and long differencing methods have notably larger standard deviations (25-40% higher)
Under model misspecification:
When outcome model is misspecified but propensity score is correct: IPWRA has smallest RMSE due to its double robustness
When both models are misspecified: IPWRA still tends to have smaller bias than RA while maintaining reasonable precision
Estimator Selection Guidelines
Start with RA when N is moderate and functional form assumptions are plausible. RA is efficient and computationally simple.
Use IPWRA as primary estimator when:
Functional form assumptions are uncertain
Robustness to model misspecification is desired
Controls are available for both outcome and propensity score models
Use IPW/PSM when:
Propensity score weighting or matching is preferred for substantive reasons
Comparison with other treatment effects literature is desired
Report multiple estimators for robustness: If RA and IPWRA give similar results, this provides evidence that findings are not sensitive to functional form assumptions.
Relationship to Long Differencing Approaches
Long differencing approaches use only the period just prior to intervention, discarding information from earlier pre-treatment periods. Lee and Wooldridge (2025) shows that this can result in substantial efficiency loss. The rolling transformation approach uses all suitable pre-treatment periods, achieving efficiency close to POLS while permitting application of doubly robust estimators.
See Also
lwdid.lwdid()- Main function that calls these estimation functionsTransformations Module - Transformation functions applied before estimation
Methodological Notes - Theoretical background on inference
User Guide - Comprehensive usage guide