Methodological Notes

This document provides the theoretical foundation and methodological details for the Lee and Wooldridge difference-in-differences methods implemented in lwdid.

Overview

The lwdid package implements the Lee and Wooldridge methods for difference-in- differences estimation with panel data, covering three scenarios:

  1. Small-sample common timing (Lee and Wooldridge 2026): Exact t-based inference under classical linear model assumptions when the number of cross-sectional units is small.

  2. Large-sample common timing (Lee and Wooldridge 2025): Rolling transformation approach with asymptotic inference using robust standard errors.

  3. Staggered adoption (Lee and Wooldridge 2025): Extension to settings where units are treated at different times, with cohort-time specific effect estimation.

The core method transforms panel data into cross-sectional form via unit-specific time-series operations:

  1. Transform panel data into cross-sectional form via unit-specific time-series operations

  2. Estimate treatment effects from a cross-sectional regression on transformed data

  3. Under classical linear model assumptions (including normality and homoskedasticity), conduct exact t-based inference from the cross-sectional regression

The method exploits an algebraic equivalence: by removing unit-specific pre-treatment patterns (means or trends), the panel DiD estimator can be obtained from a cross-sectional regression where exact finite-sample inference is available under the classical linear model assumptions, particularly normality and homoskedasticity of the error term, when homoskedastic OLS standard errors are used.

The Panel-to-Cross-Section Transformation

Core Principle

The method removes unit-specific time-series patterns using only pre-treatment data, then pools the transformed outcomes across units. This proceeds in two steps:

  1. Transformation step: Apply a unit-specific time-series transformation using pre-treatment periods only

  2. Regression step: Estimate treatment effects from a cross-sectional regression on the transformed outcomes

The transformation uses only pre-treatment information, preserving the treatment variation for estimation in the second step.

Demean Transformation

Setup

Conceptually, unit \(i\) is observed over \(T\) periods (\(t = 1, \ldots, T\)). Treatment begins at period \(S\), where \(S \in \{2, \ldots, T\}\). Pre-treatment periods are \(t = 1, \ldots, S-1\) (\(T_0 = S-1\) periods). Post-treatment periods are \(t = S, \ldots, T\) (\(T_1 = T-S+1\) periods).

In the notation of Lee and Wooldridge (2026), this description uses a balanced-panel setup where each unit is observed in all T periods. The lwdid implementation also accommodates unbalanced panels: units need not appear in every period. However, each unit included in the data must have at least one pre-treatment observation so that its pre-treatment mean can be computed. Units without any post-treatment observations remain in the panel but do not contribute to the main ATT regression because their post-treatment average is undefined.

Procedure

  1. Compute the pre-treatment mean for each unit i:

    \[\bar{Y}_{i,pre} = \frac{1}{S-1} \sum_{t=1}^{S-1} Y_{it}\]
  2. Compute the post-treatment mean for each unit i:

    \[\bar{Y}_{i,post} = \frac{1}{T-S+1} \sum_{t=S}^{T} Y_{it}\]
  3. Construct the transformed outcome:

    \[\Delta\bar{Y}_i = \bar{Y}_{i,post} - \bar{Y}_{i,pre}\]
  4. Estimate the cross-sectional regression:

    \[\Delta\bar{Y}_i = \alpha + \tau D_i + U_i, \quad i = 1, \ldots, N\]

    where \(D_i = 1\) for treated units and \(D_i = 0\) for control units.

Estimand: \(\tau\) is the average treatment effect on the treated (ATT).

Equivalence: This transformation yields the same point estimate as the two-way fixed effects (TWFE) DiD estimator. The cross-sectional representation enables exact t-based inference under classical linear model assumptions.

Detrend Transformation

Motivation: When units exhibit heterogeneous linear trends in pre-treatment periods, the standard parallel trends assumption (constant trends across units) may be too restrictive. Detrending allows for unit-specific linear trends by removing them before estimating treatment effects.

Procedure

  1. For each unit i, estimate a linear trend using pre-treatment data:

    \[Y_{it} = A_i + B_i t + \varepsilon_{it} \quad \text{for } t = 1, \ldots, S-1\]

    This requires at least two pre-treatment periods (\(T_0 \geq 2\)).

  2. Compute predicted values for all periods using the estimated trend:

    \[\hat{Y}_{it} = \hat{A}_i + \hat{B}_i t \quad \text{for } t = 1, \ldots, T\]
  3. For post-treatment periods, compute out-of-sample residuals:

    \[\ddot{Y}_{it} = Y_{it} - \hat{Y}_{it} \quad \text{for } t = S, \ldots, T\]
  4. Average the residuals over post-treatment periods:

    \[\bar{\ddot{Y}}_i = \frac{1}{T-S+1} \sum_{t=S}^{T} \ddot{Y}_{it}\]
  5. Estimate the cross-sectional regression:

    \[\bar{\ddot{Y}}_i = \alpha + \tau_{DT} D_i + U_i, \quad i = 1, \ldots, N\]

Estimand: \(\tau_{DT}\) is the ATT after removing unit-specific linear trends.

Note: The trend is estimated using only pre-treatment data, so the treatment variation remains available for estimation in the cross-sectional regression. As in the demean case, the implementation allows unbalanced panels: units need not appear in every period, but each unit included in the data must have at least two pre-treatment observations so that its trend can be estimated. Units without any post-treatment observations remain in the panel but do not contribute to the main ATT regression because their post-treatment average is undefined.

Seasonal Transformations (demeanq/detrendq)

demeanq: Extends the demean transformation to include seasonal fixed effects, removing seasonal patterns in periodic data.

detrendq: Extends the detrend transformation to include both linear trends and seasonal fixed effects.

Both methods include seasonal dummies in the pre-treatment regression to remove seasonal variation before computing post-treatment residuals.

Generalized Q Parameter

The seasonal transformations support arbitrary seasonal periods through the Q parameter:

  • Q=4 (default): Quarterly data with 4 seasons per year

  • Q=12: Monthly data with 12 seasons per year

  • Q=52: Weekly data with 52 seasons per year

The mathematical formulation generalizes to Q seasons. For demeanq, the pre-treatment regression for each unit i is:

\[Y_{it} = \alpha_i + \sum_{q=2}^{Q} \gamma_q S_{itq} + \varepsilon_{it}\]

where \(S_{itq}\) is a dummy variable equal to 1 if observation t falls in season q (with season 1 as the reference category).

For detrendq, the pre-treatment regression includes both trend and seasonal terms:

\[Y_{it} = \alpha_i + \beta_i t + \sum_{q=2}^{Q} \gamma_q S_{itq} + \varepsilon_{it}\]

Minimum Pre-Treatment Requirements

The minimum number of pre-treatment observations per unit depends on Q:

  • demeanq: \(n_{pre} \geq Q + 1\) (to estimate intercept + Q-1 seasonal dummies)

  • detrendq: \(n_{pre} \geq Q + 2\) (to estimate intercept + trend + Q-1 seasonal dummies)

For monthly data (Q=12), this means at least 13 pre-treatment observations for demeanq and 14 for detrendq. For weekly data (Q=52), at least 53 and 54 observations respectively.

Season Coverage Requirement

For each unit, every season that appears in the post-treatment period must also appear in the pre-treatment period. This ensures that seasonal effects can be properly removed from post-treatment observations.

Numerical Stability

For high-dimensional seasonal adjustments (especially Q=52), the implementation includes numerical stability checks:

  • Condition number monitoring for the design matrix

  • Warnings when the design matrix approaches singularity

  • Robust OLS estimation using QR decomposition

Inference Under CLM Assumptions

Classical Linear Model Assumptions

When homoskedastic OLS standard errors are used (vce=None in lwdid), exact finite-sample inference is available under the classical linear model (CLM) assumptions for the cross-sectional regression. For the demean transformation, the model is \(\Delta\bar{Y}_i = \alpha + \tau D_i + U_i\) (for detrending, replace \(\Delta\bar{Y}_i\) with \(\bar{\ddot{Y}}_i\)). The CLM assumptions are:

  1. Linearity: \(E[U_i | D_i] = 0\) (zero conditional mean)

  2. Random sampling: Units are independently sampled

  3. No perfect collinearity: Treatment indicator varies across units

  4. Homoskedasticity: \(\text{Var}(U_i | D_i) = \sigma^2\) (constant variance)

  5. Normality: \(U_i | D_i \sim N(0, \sigma^2)\) (conditional normality)

Under these assumptions and with homoskedastic OLS standard errors, the t-statistic \((\hat{\tau} - \tau)/\text{se}(\hat{\tau})\) follows an exact t-distribution with residual degrees of freedom equal to \(N - k\), where \(k\) is the number of estimated parameters in the cross-sectional regression (\(k = 2\) without controls: intercept and treatment indicator). The normality and homoskedasticity assumptions are critical for exact inference.

Degrees of Freedom

Homoskedastic standard errors:

\[df = N - k\]

where \(N\) is the number of cross-sectional units and \(k\) is the number of parameters (\(k = 2\) without controls: intercept and treatment indicator).

Cluster-robust standard errors:

\[df = G - 1\]

where \(G\) is the number of clusters.

The cross-sectional regression has \(N\) observations (one per unit), not \(N \times T\). With cluster-robust standard errors, degrees of freedom equal the number of clusters minus one.

Period-Specific Effects

In addition to an overall post-treatment average effect, the Lee and Wooldridge framework allows estimation of period-specific treatment effects by running separate cross-sectional regressions for each post-treatment period, using the transformed outcome in that period as the dependent variable. In lwdid, these regressions use the same variance estimator (vce) and control-variable specification as the main ATT regression.

Robust Inference

Heteroskedasticity-Robust Standard Errors

When homoskedasticity is violated, heteroskedasticity-robust standard errors provide asymptotically valid inference. The lwdid package supports the following HC estimators:

  • HC0: The original heteroskedasticity-consistent estimator. Tends to underestimate standard errors in small samples.

  • HC1: Degrees-of-freedom adjusted version of HC0 (applies \(n/(n-k)\) correction). Equivalent to vce='robust'.

  • HC2: Leverage-adjusted estimator that divides squared residuals by \((1 - h_{ii})\) where \(h_{ii}\) is the diagonal of the hat matrix.

  • HC3: Small-sample adjusted version that divides squared residuals by \((1 - h_{ii})^2\). Recommended for small to moderate samples.

  • HC4: High-leverage adjusted estimator with adaptive exponent based on leverage. Use when data contains influential observations.

Robust standard errors rely on asymptotic approximations and may be less accurate in very small samples. Simulation evidence suggests HC3 can perform reasonably well even with small sample sizes, though caution is warranted with very small samples. Exact inference is not available under heteroskedasticity.

Cluster-Robust Standard Errors

When errors are correlated within clusters, cluster-robust standard errors account for within-cluster correlation.

These standard errors rely on large-sample approximations in the number of clusters. Inference is more reliable when the number of clusters is not too small; lwdid issues a warning when the cluster count is very low. Degrees of freedom for cluster-robust inference are taken as G - 1. The cluster variable must be nested within the unit identifier (each unit belongs to exactly one cluster across all time periods), reflecting the usual assumption that clusters are independent.

Randomization Inference

Randomization inference constructs a reference distribution for the test statistic under the sharp null hypothesis without relying on normality or homoskedasticity assumptions.

Procedure:

  1. Compute the observed test statistic. In lwdid, this is the ATT estimate \(\hat{\tau}\) from the cross-sectional regression.

  2. Randomly reassign treatment status across units

  3. Re-estimate the model with reassigned treatment

  4. Repeat steps 2-3 \(R\) times (e.g., \(R = 1000\))

  5. Compute the p-value as the proportion of replications with the test statistic at least as extreme (in absolute value) as the observed value

Methods:

  • Bootstrap (with replacement): Treatment group size may vary across replications

  • Permutation (without replacement): Fisher randomization inference; treatment group size is fixed across permutations

In lwdid, randomization inference always re-estimates the cross-sectional regression by OLS (ignoring the vce option used for the main estimate). The default ri_method is 'bootstrap' for backward compatibility, while 'permutation' corresponds to the classical Fisher randomization approach and is generally recommended for design-based randomization inference.

Advantages:

  • Does not rely on normality or homoskedasticity assumptions

  • Naturally accommodates heteroskedasticity and non-normality under the maintained randomization scheme

Limitations:

  • Computationally intensive for large numbers of replications

  • Tests only the sharp null hypothesis (zero treatment effect for all units)

Clustering at Higher Levels

When the policy or treatment varies at a level higher than the unit of observation, cluster standard errors at the policy variation level (Lee and Wooldridge 2026). This section provides guidance on choosing the appropriate clustering level and tools for diagnosing clustering structure.

When to Cluster at Higher Levels

Principle: Cluster at the level where treatment is assigned or varies.

Example: If studying a state-level policy using county-level data, cluster at the state level because the policy varies across states, not counties:

result = lwdid(
    data, y='outcome', ivar='county', tvar='year',
    gvar='first_treat',
    vce='cluster', cluster_var='state'
)

Rationale: When treatment is assigned at a higher level (e.g., state), errors are likely correlated within that level. Clustering at the unit level (county) would understate standard errors because it ignores this correlation.

Minimum Cluster Requirements

The reliability of cluster-robust inference depends on the number of clusters:

  • Recommended: ≥ 20-30 clusters for reliable asymptotic inference

  • Warning threshold: < 10 clusters may produce unreliable inference

  • Alternative: Use wild cluster bootstrap when clusters are few

lwdid automatically warns when the number of clusters is small:

  • < 10 clusters: Warning with recommendation to use wild cluster bootstrap

  • 10-19 clusters: Informational message suggesting wild cluster bootstrap

  • Cluster size imbalance (CV > 1.0): Warning about potential reliability issues

Degrees of Freedom

With cluster-robust standard errors, degrees of freedom are:

\[df = G - 1\]

where \(G\) is the number of clusters. This is more conservative than the usual \(N - k\) degrees of freedom, reflecting the effective sample size being the number of clusters rather than the number of observations.

Diagnosing Clustering Structure

Use diagnose_clustering() to analyze potential clustering options and get recommendations:

from lwdid import diagnose_clustering

diag = diagnose_clustering(
    data, ivar='county',
    potential_cluster_vars=['state', 'region', 'county'],
    gvar='first_treat'
)
print(diag.summary())

The diagnostics include:

  • Cluster counts: Total, treated, and control clusters for each variable

  • Cluster sizes: Min, max, mean, median, and coefficient of variation

  • Level detection: Whether each variable is at a higher/same/lower level than unit

  • Treatment variation: Whether treatment varies within clusters

  • Recommendation: Suggested clustering variable with explanation

Getting a Clustering Recommendation

For a detailed recommendation with confidence scores and alternatives:

from lwdid import recommend_clustering_level

rec = recommend_clustering_level(
    data, ivar='county', tvar='year',
    potential_cluster_vars=['state', 'region', 'county'],
    gvar='first_treat',
    min_clusters=20
)
print(rec.summary())

if rec.use_wild_bootstrap:
    print("Consider using wild_cluster_bootstrap() for inference")

The recommendation considers:

  • Treatment variation level (cluster at this level when possible)

  • Number of clusters (more is better, ≥20 recommended)

  • Balance between treated and control clusters

  • Cluster size variation (lower CV is better)

Checking Clustering Consistency

Verify that your chosen clustering level is consistent with the treatment assignment mechanism:

from lwdid import check_clustering_consistency

result = check_clustering_consistency(
    data, ivar='county', cluster_var='state',
    gvar='first_treat'
)

if not result.is_consistent:
    print(f"Warning: {result.recommendation}")

A clustering choice is consistent when:

  • Treatment does not vary within clusters (or varies minimally, < 5%)

  • Cluster level is at or above the treatment variation level

If treatment varies within clusters, standard errors may be conservative (too large), leading to under-rejection of the null hypothesis.

Wild Cluster Bootstrap

When the number of clusters is small (< 20), the wild cluster bootstrap provides more reliable inference than asymptotic cluster-robust standard errors. The procedure constructs a bootstrap distribution by:

  1. Estimating the original model and obtaining residuals

  2. Generating cluster-level random weights

  3. Creating bootstrap residuals by multiplying original residuals by weights

  4. Re-estimating the model with bootstrap outcomes

  5. Computing the bootstrap distribution of t-statistics

Basic usage:

from lwdid.inference import wild_cluster_bootstrap

result = wild_cluster_bootstrap(
    data, y_transformed='ydot', d='d_',
    cluster_var='state', n_bootstrap=999
)
print(f"ATT: {result.att:.4f}")
print(f"Bootstrap SE: {result.se_bootstrap:.4f}")
print(f"Bootstrap p-value: {result.pvalue:.4f}")
print(f"95% CI: [{result.ci_lower:.4f}, {result.ci_upper:.4f}]")

Weight types:

  • rademacher (default): P(w=1) = P(w=-1) = 0.5. Simplest and most common.

  • mammen: Two-point distribution matching first three moments. Better for asymmetric error distributions.

  • webb: Six-point distribution. Recommended for very few clusters (G < 10).

Example with Webb weights for few clusters:

result = wild_cluster_bootstrap(
    data, y_transformed='ydot', d='d_',
    cluster_var='state',
    weight_type='webb',  # Better for very few clusters
    n_bootstrap=999,
    seed=42  # For reproducibility
)

Null hypothesis imposition:

By default (impose_null=True), the bootstrap constructs samples under the null hypothesis that the treatment effect is zero. This provides better size control for hypothesis testing. Set impose_null=False for confidence interval construction when the null may not hold.

Clustering Workflow

A recommended workflow for choosing and validating clustering:

  1. Diagnose: Use diagnose_clustering() to understand the data structure

  2. Recommend: Use recommend_clustering_level() to get a recommendation

  3. Check: Use check_clustering_consistency() to validate the choice

  4. Estimate: Run lwdid() with the chosen cluster_var

  5. Bootstrap: If clusters < 20, use wild_cluster_bootstrap() for inference

Complete example:

import pandas as pd
from lwdid import (
    lwdid, diagnose_clustering, recommend_clustering_level,
    check_clustering_consistency
)
from lwdid.inference import wild_cluster_bootstrap

# Step 1-2: Diagnose and get recommendation
rec = recommend_clustering_level(
    data, ivar='county', tvar='year',
    potential_cluster_vars=['state', 'region'],
    gvar='first_treat'
)

# Step 3: Check consistency
consistency = check_clustering_consistency(
    data, ivar='county', cluster_var=rec.recommended_var,
    gvar='first_treat'
)

# Step 4: Estimate with cluster-robust SE
result = lwdid(
    data, y='outcome', ivar='county', tvar='year',
    gvar='first_treat',
    vce='cluster', cluster_var=rec.recommended_var
)

# Step 5: Wild bootstrap if few clusters
if rec.use_wild_bootstrap:
    boot_result = wild_cluster_bootstrap(
        result.data_transformed, y_transformed='ydot', d='d_',
        cluster_var=rec.recommended_var, n_bootstrap=999
    )
    print(f"Bootstrap p-value: {boot_result.pvalue:.4f}")

Large-Sample Asymptotic Inference

When the cross-sectional sample size N is moderately large, asymptotic inference based on robust standard errors becomes appropriate. Lee and Wooldridge (2025) develops the theoretical foundation for large-sample inference in the rolling transformation framework.

Asymptotic Theory

Under standard regularity conditions (independent sampling across units, finite moments, and either correct specification of the outcome model or propensity score model for doubly robust estimators), the ATT estimator is asymptotically normal:

\[\sqrt{N} (\hat{\tau} - \tau) \xrightarrow{d} N(0, V)\]

where \(V\) is the asymptotic variance. This justifies the use of heteroskedasticity-robust standard errors (HC0-HC4) or cluster-robust standard errors for constructing confidence intervals and hypothesis tests.

The key insight from Lee and Wooldridge (2025) is that the rolling transformation converts the panel DiD problem into a cross-sectional treatment effects problem, enabling application of standard large-sample theory from the treatment effects literature.

Doubly Robust Estimation

Lee and Wooldridge (2025) shows that the rolling transformation approach enables application of doubly robust estimators, which provide consistent estimates when either the outcome model or the propensity score model is correctly specified.

IPWRA (Inverse Probability Weighted Regression Adjustment):

The doubly robust IPWRA estimator combines regression adjustment and inverse probability weighting. For cohort \(g\) in period \(r\), the estimator takes the form:

\[\hat{\tau}_{IPWRA} = N_1^{-1} \sum_i D_i [\hat{Y}_{ir} - \hat{m}_0(X_i)] - N_1^{-1} \sum_i (1-D_i) \frac{\hat{p}(X_i)}{1-\hat{p}(X_i)} [\hat{Y}_{ir} - \hat{m}_0(X_i)]\]

where \(N_1\) is the number of treated units, \(\hat{m}_0(\cdot)\) is the estimated conditional mean for control units, and \(\hat{p}(\cdot)\) is the estimated propensity score.

Double robustness property: The IPWRA estimator is consistent if:

  1. The outcome model \(E[Y|D=0, X]\) is correctly specified, OR

  2. The propensity score \(P(D=1|X)\) is correctly specified

This property makes IPWRA particularly attractive when functional form assumptions are uncertain.

IPW (Inverse Probability Weighting):

The IPW estimator weights observations by the inverse of their propensity scores:

\[\hat{\tau}_{IPW} = N_1^{-1} \sum_i D_i \hat{Y}_{ir} - \frac{\sum_i (1-D_i)\hat{w}_i\hat{Y}_{ir}}{\sum_i (1-D_i)\hat{w}_i}\]

where \(\hat{w}_i = \hat{p}(X_i)/(1-\hat{p}(X_i))\). IPW is consistent when the propensity score is correctly specified.

PSM (Propensity Score Matching):

Propensity score matching estimates treatment effects by matching treated units to control units with similar propensity scores. Nearest-neighbor matching finds the closest control unit(s) for each treated unit based on the estimated propensity score.

Efficiency Considerations

Under correct specification of all models, Lee and Wooldridge (2025) shows that:

  1. Regression adjustment (RA) is both best linear unbiased (BLUE) and asymptotically efficient under standard assumptions

  2. IPWRA achieves efficiency close to RA while providing robustness to model misspecification

  3. Long differencing methods can be considerably less efficient because they use only the period just prior to intervention

Monte Carlo Simulation Evidence

The Monte Carlo simulations in Lee and Wooldridge (2025) provide quantitative evidence for these theoretical results. Key findings include:

Efficiency comparison under correct specification:

  • RA/POLS: Relative SD = 1.00 (baseline), RMSE ratio to RA = 1.00. Best linear unbiased estimator.

  • IPWRA: Relative SD = 1.03-1.05, RMSE ratio to RA ≈ 1.03. Doubly robust property.

  • IPW: Relative SD = 1.08-1.15, RMSE ratio to RA ≈ 1.10. Propensity score-based weighting.

  • PSM: Relative SD = 1.25-1.40, RMSE ratio to RA ≈ 1.30. Matching-based estimator.

Rolling vs. long differencing efficiency:

The rolling transformation uses all pre-treatment periods to estimate unit-specific means or trends, whereas long differencing methods use only the period immediately prior to treatment. This difference has substantial efficiency implications:

  • Rolling transformation: Uses \(T_0\) pre-treatment periods per unit

  • Long differencing: Uses only 1 pre-treatment period per unit

  • Efficiency gain: Rolling achieves standard deviation approximately \(\sqrt{T_0}\) times smaller than long differencing when \(T_0 > 1\), under homoskedastic errors

In the simulation designs with \(T_0 = 4\), the rolling approach achieves approximately 25-40% smaller standard deviations than long differencing methods, translating to substantially more precise estimates.

Robustness to misspecification:

When outcome models are misspecified but propensity scores are correct:

  • IPWRA maintains small bias due to double robustness property

  • RA may exhibit larger bias under outcome model misspecification

  • IPWRA RMSE becomes comparable to or smaller than RA RMSE

These findings support using IPWRA as the primary estimator when functional form assumptions are uncertain.

When to Use Large-Sample Methods

Large-sample asymptotic inference is appropriate when:

  • The cross-sectional sample size \(N\) is moderately large (e.g., \(N \geq 50\))

  • Homoskedasticity and normality assumptions may not hold

  • Doubly robust estimation is desired for robustness to model misspecification

For small samples where CLM assumptions (normality and homoskedasticity) are plausible, exact t-based inference with vce=None remains appropriate.

Inference Distribution by Estimator

The lwdid package uses different reference distributions for statistical inference depending on the estimator and sample size considerations:

Regression Adjustment (RA):

  • Uses the t-distribution with degrees of freedom \(df = N - k\), where \(N\) is the cross-sectional sample size and \(k\) is the number of regression parameters

  • Under CLM assumptions (homoskedasticity and normality), this provides exact finite-sample inference

  • With robust standard errors (HC0-HC4), inference is asymptotically valid

Inverse Probability Weighting (IPW):

  • Uses the normal distribution for asymptotic inference

  • This follows the standard inverse probability weighting framework

  • Valid for moderate to large samples where asymptotic approximations hold

Inverse Probability Weighted Regression Adjustment (IPWRA):

  • Uses the normal distribution for asymptotic inference

  • This is consistent with Lee and Wooldridge (2025), which develops the asymptotic theory for doubly robust estimators in the rolling transformation framework

  • For small samples, consider using RA with vce=None for exact t-based inference instead

Propensity Score Matching (PSM):

  • Uses the normal distribution for asymptotic inference

  • Standard errors are computed analytically

  • PSM inference is generally valid for moderate to large samples

Practical Guidance:

  • Small samples (\(N < 50\)): Use RA (estimator='ra') with vce=None for exact t-based inference under CLM assumptions, or use randomization inference (ri=True) for assumption-free testing

  • Moderate samples (\(50 \leq N < 200\)): Use RA or IPWRA with HC3 standard errors (vce='hc3')

  • Large samples (\(N \geq 200\)): All estimators with asymptotic inference are appropriate; IPWRA provides robustness to model misspecification

Identification Assumptions

No Anticipation

Units do not change behavior in anticipation of future treatment. Pre-treatment outcomes are unaffected by future treatment status.

Example violation: Firms may alter behavior before a regulation takes effect.

Common Treatment Timing (Common Timing Mode)

In common timing mode, all treated units begin treatment in the same period. This is specified via the post parameter, which indicates post-treatment periods.

For staggered adoption settings where units are treated at different times, use the gvar parameter instead of post. See the Staggered Adoption section below.

Treatment Persistence

Once treated, units remain treated. At the implementation level, the post-treatment indicator must be monotone non-decreasing in the time index (once post switches from 0 to 1, it cannot revert to 0). Treatment reversals or temporary treatments are therefore not permitted.

Unbalanced Panels and Selection Mechanism

Lee and Wooldridge (2025) addresses the treatment of unbalanced panels, where not all units are observed in all time periods. This section describes the selection mechanism assumption and guidance for working with incomplete panel data.

Selection Mechanism Assumption

The key assumption for unbalanced panels is that selection into the sample may depend on unobserved time-invariant heterogeneity (which is removed by the transformation), but cannot systematically depend on the shocks to \(Y_{it}(\infty)\).

Formally, if \(S_{it}\) indicates whether unit \(i\) is observed at time \(t\):

\[E[Y_{it}(\infty) | S_{it} = 1, D_i, X_i] = E[Y_{it}(\infty) | D_i, X_i]\]

This means that, conditional on treatment status and covariates, being observed does not systematically predict the untreated potential outcome.

Analogy to Fixed Effects: This assumption is analogous to the standard fixed effects assumption where the error term may be correlated with time-invariant characteristics but not with idiosyncratic shocks. The rolling transformation removes the time-invariant component, so selection based on permanent characteristics does not cause bias.

Acceptable Missing Data Patterns

Missing data is acceptable when it depends on:

  1. Time-invariant characteristics: Units that are always high or low performers may be more or less likely to remain in the sample

  2. Observable covariates: Missingness related to baseline characteristics that are controlled for in the analysis

  3. Random factors: Missing completely at random (MCAR)

  4. Deterministic patterns: Units observed only in certain calendar time periods due to sample design

Examples of acceptable patterns:

  • A panel of firms where smaller firms are less likely to survive (selection on time-invariant firm quality)

  • A survey where certain regions are only surveyed in specific years

  • Administrative data where units enter and exit based on observable eligibility

Problematic Missing Data Patterns

Missing data is problematic when it depends on:

  1. Outcome shocks: Units experiencing negative shocks being more likely to leave the sample

  2. Treatment anticipation: Units selecting out based on expected treatment effects

  3. Unobserved time-varying factors: Missingness correlated with transitory components of the outcome

Examples of problematic patterns:

  • Firms exiting the sample after experiencing losses (selection on outcome shocks)

  • Workers leaving a survey after job loss, where job loss is correlated with

    outcomes of interest

  • Treatment group units dropping out more frequently than control units

    in a pattern correlated with outcomes

Detrending for Robustness

Lee and Wooldridge (2025) notes that detrending provides additional robustness to unbalanced panels. Removing unit-specific trends allows for two sources of heterogeneity — level and trend — to be correlated with selection into the sample.

When selection may depend on both the level and growth rate of the outcome, detrending removes both sources of heterogeneity, making the estimates more robust to selection bias.

Recommendation: When panel imbalance is substantial or selection mechanisms are uncertain, consider using rolling='detrend' for additional robustness.

Diagnosing Selection Risk

The lwdid package provides diagnostic tools for assessing selection risk:

from lwdid import diagnose_selection_mechanism

diagnostics = diagnose_selection_mechanism(
    data, ivar='unit', tvar='year', gvar='first_treat'
)

print(diagnostics.summary())

The diagnostics include:

  • Balance statistics: Panel balance ratio, observation counts

  • Attrition analysis: Dropout rates by cohort and period

  • Missing pattern classification: MCAR, MAR, or MNAR assessment

  • Risk level: LOW, MEDIUM, or HIGH selection risk

  • Recommendations: Suggested actions based on diagnostics

Interpreting risk levels:

  • LOW: Proceed with estimation; selection mechanism likely satisfied

  • MEDIUM: Consider using detrending; perform sensitivity analysis

  • HIGH: Results should be interpreted with caution; consider balanced subsample

Controlling Panel Balance

The balanced_panel parameter in lwdid() controls behavior when unbalanced panels are detected:

  • balanced_panel='warn' (default): Issue warning and continue estimation

  • balanced_panel='error': Raise exception; require balanced panel

  • balanced_panel='ignore': Silently proceed with unbalanced panel

# Strict balanced panel requirement
result = lwdid(
    data, y='outcome', ivar='unit', tvar='year',
    gvar='first_treat',
    balanced_panel='error'  # Raises exception if unbalanced
)

See Selection Diagnostics Module (selection_diagnostics) for detailed documentation of the diagnostic functions.

Pre-treatment Period Dynamics

Lee and Wooldridge (2025) develops a methodology for estimating treatment effects in pre-treatment periods to assess the validity of the parallel trends assumption. This section describes the theoretical foundation and implementation.

Motivation

The parallel trends assumption is fundamental to difference-in-differences identification but cannot be directly tested because it concerns counterfactual outcomes. However, examining pre-treatment dynamics provides indirect evidence:

  • Under parallel trends, the transformed outcome difference between treated and control groups should be zero in all pre-treatment periods

  • Systematic non-zero pre-treatment effects suggest potential violations

Rolling Transformation for Pre-treatment Periods

For pre-treatment periods, the transformation uses future pre-treatment periods rather than past periods. This ensures the transformation is well-defined for all pre-treatment periods.

Pre-treatment Demeaning

For cohort \(g\) and pre-treatment period \(t < g\), the demeaned outcome is:

\[\dot{Y}_{itg} = Y_{it} - \frac{1}{g-1-t} \sum_{q=t+1}^{g-1} Y_{iq}\]

where the sum is over future pre-treatment periods \(\{t+1, \ldots, g-1\}\).

Key properties:

  • Uses only pre-treatment data (periods before \(g\))

  • Rolling window looks forward, not backward

  • Window size decreases as \(t\) approaches \(g-1\)

Pre-treatment Detrending

For cohort \(g\) and pre-treatment period \(t < g\), fit an OLS regression:

\[Y_{iq} = A + B \cdot q + \varepsilon_{iq} \quad \text{for } q \in \{t+1, \ldots, g-1\}\]

Then compute the detrended outcome:

\[\ddot{Y}_{itg} = Y_{it} - (\hat{A} + \hat{B} \cdot t)\]

This requires at least 2 future pre-treatment periods for OLS estimation.

Anchor Point Convention

The period immediately before treatment (\(t = g-1\), event time \(e = -1\)) serves as the anchor point:

\[\dot{Y}_{i,g-1,g} = 0 \quad \text{(by construction)}\]

This occurs because the rolling window \(\{g, \ldots, g-1\}\) is empty, and by convention the transformation returns zero.

The anchor point provides:

  1. A reference point for interpreting other pre-treatment effects

  2. Normalization that facilitates comparison across cohorts

  3. Consistency with event study visualization conventions

Pre-treatment ATT Estimation

For each cohort \(g\) and pre-treatment period \(t < g\), the pre-treatment ATT is estimated by regressing the transformed outcome on treatment status:

\[\dot{Y}_{itg} = \alpha + \tau_{gt}^{pre} D_i + U_i\]

where \(D_i = 1\) for units in cohort \(g\) and \(D_i = 0\) for control units (never-treated or not-yet-treated at time \(t\)).

Under parallel trends, \(E[\tau_{gt}^{pre}] = 0\) for all \(t < g\).

Control Group for Pre-treatment Periods

For pre-treatment period \(t < g\), valid control units are:

  • Never-treated: Units with \(D_\infty = 1\)

  • Not-yet-treated at t: Units with first treatment after \(t\) (cohorts \(h > t\))

This differs from post-treatment control groups because units must be untreated at time \(t\), not just at time \(r \geq g\).

Comparison with Other DiD Methods

Two-Way Fixed Effects (TWFE)

The demean transformation is algebraically equivalent to TWFE with unit and time fixed effects. The Lee and Wooldridge method makes the cross-sectional structure explicit, enabling exact t-based inference in small samples under the CLM assumptions when homoskedastic OLS standard errors are used. In contrast, TWFE inference is usually based on large-sample approximations with cluster-robust standard errors.

Long Differencing Approaches

Long differencing approaches to staggered DiD use only the single period just prior to the intervention as a reference. Lee and Wooldridge (2025) extends the rolling transformation approach to staggered adoption settings and develops large-sample asymptotic inference using doubly robust estimators (IPWRA). The rolling approach uses all pre-treatment periods, achieving better efficiency while permitting application of various treatment effect estimators. Lee and Wooldridge (2026) focuses on small cross-sectional sample sizes in common timing settings, showing that exact t-based inference is available under classical linear model assumptions (normality and homoskedasticity).

When to Use This Method

Small-sample exact inference (common timing with vce=None):

  • Small numbers of treated and/or control units

  • Willingness to assume normality and homoskedasticity

  • Alternative: use randomization inference (ri=True)

Large-sample asymptotic inference (common timing or staggered):

  • Moderately large cross-sectional samples

  • Use robust standard errors (vce='robust' or vce='hc3')

  • For staggered designs, use gvar to specify first treatment period

Staggered adoption:

  • Units treated at different times

  • Supports cohort-time specific effect estimation

  • Multiple aggregation options (none, cohort, overall)

  • Flexible control group strategies (never-treated or not-yet-treated)

This implementation does not cover:

  • Treatment reversals or temporary treatments (violates persistence assumption)

  • Continuous treatment intensity (treatment must be binary)

Practical Considerations

Sample Size Requirements

Minimum (panel-level checks): lwdid requires at least \(N \geq 3\) units in the panel, with at least one treated unit (\(N_1 \geq 1\)) and at least one control unit (\(N_0 \geq 1\)).

In addition, the firstpost cross-sectional regression used for ATT estimation must contain at least three units in total and at least one treated and one control unit. If some units drop out of the panel before the post-treatment period, the effective cross-sectional sample entering the main regression can be smaller than the panel \(N\); in such cases lwdid raises an error rather than reporting an ATT based on an under-identified regression.

Practical considerations:

  • Exact t-based inference under CLM assumptions is technically available at this minimum, but in practice inference is more stable when the cross-sectional sample is meaningfully larger than \(N = 3\).

  • Cluster-robust standard errors rely on having a non-trivial number of clusters. A commonly used rule of thumb in applied work is to have around \(G \geq 10\) clusters for more reliable cluster-robust inference, although there is no universal threshold.

Time Index and Panel Structure

  • The time index used internally by lwdid must form a contiguous sequence without gaps (e.g., years 2000, 2001, 2002, …). If the original time variables have gaps, the function raises an error rather than silently interpolating.

  • The post-treatment indicator post must be a pure function of time (common timing): at any given time period, all units are either pre-treatment or post-treatment. This aligns with the common timing assumption described above.

Pre-Treatment Periods

Minimum requirements in this implementation:

  • demean: At least one pre-treatment period overall (\(T_0 \geq 1\)), and each unit must have at least one pre-treatment observation so that its pre-treatment mean can be computed.

  • detrend: At least two pre-treatment periods overall (\(T_0 \geq 2\)), and each unit must have at least two pre-treatment observations so that a unit-specific linear trend can be estimated.

  • demeanq and detrendq: The minimum number of pre-treatment observations depends on the seasonal period Q:

    • demeanq: \(n_{pre} \geq Q + 1\) per unit (e.g., 5 for quarterly, 13 for monthly, 53 for weekly)

    • detrendq: \(n_{pre} \geq Q + 2\) per unit (e.g., 6 for quarterly, 14 for monthly, 54 for weekly)

    Additionally, for each unit, every season that appears in the post-treatment period must also appear in the pre-treatment period; otherwise seasonal effects for that season cannot be removed.

Practical recommendations:

  • \(T_0 \geq 3\) for detrend/detrendq (more stable trend estimation)

  • More pre-treatment periods improve statistical power and facilitate visual assessment of parallel trends

Control Variables

Time-invariant controls (e.g., baseline characteristics) are permitted.

Effects:

  • Reduces residual variance, increasing power

  • Reduces degrees of freedom

  • Does not affect the transformation step

Including many controls relative to sample size can lead to overfitting and unstable estimates. In this implementation, time-invariant controls are included only when both groups are sufficiently large relative to the number of controls: controls enter the regression if \(N_1 > K+1\) and \(N_0 > K+1\), where \(K\) is the number of control variables. Otherwise, lwdid issues a warning and estimates the model without controls to avoid under-identified or extremely fragile regressions. When controls are included, observations with missing values in any included control variable are dropped from the main regression sample; lwdid reports how many observations were removed. If dropping those observations would cause either group to violate the \(N_1 > K+1\) or \(N_0 > K+1\) conditions, controls are omitted instead and the full sample is retained.

Computational Considerations

Speed: The transformation and cross-sectional regression steps are computationally efficient for typical panel datasets. Randomization inference requires \(R\) replications and is more computationally intensive (computation time scales linearly with \(R\)).

Memory: Memory requirements are modest for typical panel datasets. Randomization inference stores \(R\) test statistics for computing the empirical p-value.

Limitations

This implementation has the following limitations:

  1. Binary treatment: Treatment is either on or off (no continuous treatment intensity)

  2. Time-invariant controls: Controls must not vary over time

  3. Treatment persistence: Once treated, units must remain treated

  4. Seasonal methods data requirements: The demeanq and detrendq transformations require sufficient pre-treatment observations for seasonal parameter estimation. In staggered designs, early cohorts with limited pre-treatment data may be excluded if they lack the required observations

Staggered Adoption

When units are treated at different times, the staggered adoption framework applies. This is activated by specifying the gvar parameter (first treatment period for each unit) instead of post.

Identification

For treatment cohort \(g\) (units first treated in period \(g\)) and calendar time \(r \geq g\), the ATT is identified under:

  1. No anticipation: \(E[Y_t(g) - Y_t(\infty) | D_g = 1] = 0\) for \(t < g\)

  2. Conditional parallel trends: \(E[Y_t(\infty) - Y_1(\infty) | D, X] = E[Y_t(\infty) - Y_1(\infty) | X]\)

These assumptions ensure that never-treated and not-yet-treated units provide valid counterfactuals for estimating treatment effects.

Transformation

For each cohort \(g\) and period \(r \geq g\), the transformed outcome is:

Demean:

\[\dot{Y}_{irg} = Y_{ir} - \frac{1}{g-1} \sum_{s=1}^{g-1} Y_{is}\]

Detrend:

\[\ddot{Y}_{irg} = Y_{ir} - \hat{A}_i - \hat{B}_i r\]

where the trend coefficients are estimated from pre-treatment periods \(\{1, \ldots, g-1\}\).

Control Group Strategies

Lee and Wooldridge (2025) establishes that, under the conditional parallel trends assumption (CPTS), the cohort treatment indicators are unconfounded with respect to the transformed potential outcome. This implies:

  • For estimating \(\tau_{gr}\) (effect for cohort \(g\) at time \(r\)), cohorts \(h > r\) (not-yet-treated at time \(r\)) can serve as valid controls in addition to never-treated units

  • Under CPTS, the conditional expectation \(E[\dot{Y}_{rg}(\infty)|D_\infty=1, X] = E[\dot{Y}_{rg}(\infty)|D_h=1, X]\) is the same across all cohorts

  • By the no-anticipation assumption, \(E[\dot{Y}_{rg}(\infty)|D_h=1, X] = E[\dot{Y}_{rg}(h)|D_h=1, X]\) for \(h > r\), so not-yet-treated units can substitute for never-treated units

Strategies:

  • never_treated: Only units with \(D_\infty = 1\) (never treated during observation period)

  • not_yet_treated: Never treated plus units with first treatment after period \(r\) (i.e., cohorts \(h \in \{r+1, \ldots, T, \infty\}\))

The not-yet-treated strategy uses more control observations, potentially improving efficiency, while the never-treated strategy may be more robust to violations of no anticipation.

All Units Eventually Treated

Lee and Wooldridge (2025) addresses the case where no units remain untreated through period \(T\) (no never-treated group). In this setting:

  1. Treatment effects are defined relative to \(Y_t(T)\), the state of being treated only in the final period, rather than \(Y_t(\infty)\)

  2. Effects can only be estimated for cohorts \(g \in \{S, \ldots, T-1\}\); no effect can be estimated for the final cohort (\(g = T\)) because there are no control units

  3. By no anticipation, for \(r < T\): \(E[Y_r(g) - Y_r(T)|D_g = 1] = E[Y_r(g) - Y_r(\infty)|D_g = 1]\), so except for the final period the ATTs have the same interpretation as when a never-treated group exists

  4. When \(r = T\), the only available control is cohort \(g = T\) (units first treated in the final period)

The implementation handles this automatically: when aggregating to cohort or overall effects, only cohorts with valid control groups are included.

Estimation Methods

In staggered mode, multiple estimators are available:

  • ra (Regression Adjustment): OLS on transformed outcome

  • ipw (Inverse Probability Weighting): Propensity score weighting

  • ipwra (Doubly Robust): Combines regression and IPW

  • psm (Propensity Score Matching): Nearest neighbor matching

The doubly robust IPWRA estimator is consistent if either the outcome model or propensity score model is correctly specified, making it particularly attractive when functional form assumptions are uncertain.

Inference Distribution by Estimator

Different estimators use different reference distributions for inference:

Estimator

Distribution

Rationale

RA (OLS)

t-distribution

Exact inference under CLM assumptions; df = N - k

IPW

Normal

Asymptotic inference based on influence functions

IPWRA

Normal

Asymptotic inference based on influence functions

PSM

Normal

Asymptotic inference; SE based on matching-based variance estimator

The RA estimator uses the t-distribution because Lee and Wooldridge (2026) provides exact finite-sample inference under classical linear model assumptions. The IPW, IPWRA, and PSM estimators use the normal distribution because these estimators rely on asymptotic theory (influence functions or matching-based variance estimators) where t-distribution adjustments are not directly applicable. For large samples (N > 50), the practical difference between t and normal distributions is negligible.

Aggregation

Cohort-time specific effects \(\tau_{gr}\) can be aggregated:

  • none: Report \((g,r)\)-specific effects only

  • cohort: Average effects within each cohort:

    \[\tau_g = \frac{1}{T-g+1} \sum_{r=g}^{T} \tau_{gr}\]
  • overall: Cohort-share weighted average:

    \[\tau_\omega = \sum_g \omega_g \tau_g \quad \text{where } \omega_g = N_g/N_{treat}\]

Event Time Aggregation (WATT)

The weighted average treatment effect on the treated (WATT) by event time provides an aggregation of cohort-time specific effects \(\tau_{gr}\) across cohorts sharing the same relative time since treatment onset, as developed in Lee and Wooldridge (2025).

For event time \(e = r - g\) (relative time since treatment), the WATT is:

\[\text{WATT}(e) = \sum_{g \in G_e} w(g,e) \cdot \tau_{g, g+e}\]

where \(G_e\) is the set of cohorts with a valid ATT estimate at event time \(e\), and the weights are cohort-size proportions:

\[w(g,e) = \frac{N_g}{\sum_{g' \in G_e} N_{g'}}\]

Standard Error Computation

The standard error for the WATT at event time \(e\) is computed as:

\[SE(\text{WATT}(e)) = \sqrt{\sum_{g \in G_e} [w(g,e)]^2 \cdot [SE(\tau_{g,g+e})]^2}\]

This formula assumes independence across cohorts and uses the normalized weights.

Degrees of Freedom

For t-distribution inference on event-time aggregated effects, the degrees of freedom are chosen conservatively as the minimum across contributing cohorts:

\[df(e) = \min_{g \in G_e} df_g\]

This provides conservative coverage that accounts for finite-sample uncertainty.

Event Study Visualization

Event time aggregation is particularly useful for event study plots, which display treatment effects as a function of relative time. The pre-treatment effects (negative event times) serve as placebo tests for the parallel trends assumption, while post-treatment effects reveal the dynamic treatment response.

The anchor point at event time \(e = -1\) is set to zero by convention, serving as the reference baseline for interpreting other effects.

Robustness to Pre-treatment Period Selection

Lee and Wooldridge (2026) recommends studying the robustness of DiD estimates by varying the number of pre-treatment periods used in the transformation. This section describes the sensitivity analysis tools implemented in lwdid.

Motivation

The rolling transformation approach uses pre-treatment periods to estimate unit-specific means or trends. The choice of how many pre-treatment periods to include can affect the estimates:

  • Too few periods: May not adequately capture unit-specific patterns

  • Too many periods: May include periods where parallel trends do not hold

Robustness of the estimates can be studied by adjusting the number of pre-treatment time periods used in the transformation, analogous to the variation of pre-treatment windows in synthetic control methods.

Sensitivity Ratio

The sensitivity ratio measures how much ATT estimates vary across specifications with different numbers of pre-treatment periods:

\[\text{Sensitivity Ratio} = \frac{\max_k \hat{\tau}_k - \min_k \hat{\tau}_k}{|\hat{\tau}_{\text{baseline}}|}\]

where \(\hat{\tau}_k\) is the ATT estimate using \(k\) pre-treatment periods, and \(\hat{\tau}_{\text{baseline}}\) is the estimate using all available pre-treatment periods.

Interpretation:

Sensitivity Ratio

Robustness Level

Interpretation

< 10%

Highly Robust

Estimates stable across specifications

10-25%

Moderately Robust

Some sensitivity, generally acceptable

25-50%

Sensitive

Interpret with caution

≥ 50%

Highly Sensitive

Results depend heavily on specification

Using robustness_pre_periods()

The robustness_pre_periods() function implements this sensitivity analysis:

from lwdid import robustness_pre_periods

result = robustness_pre_periods(
    data, y='outcome', ivar='unit', tvar='year',
    gvar='first_treat', rolling='detrend',
    pre_period_range=(3, 10)
)

print(result.summary())
result.plot()

The function:

  1. Estimates ATT using different numbers of pre-treatment periods

  2. Computes the sensitivity ratio

  3. Assesses robustness level

  4. Generates recommendations

No-Anticipation Sensitivity

When policy is announced before implementation, units may adjust behavior in anticipation. The sensitivity_no_anticipation() function tests robustness by excluding periods immediately before treatment:

from lwdid import sensitivity_no_anticipation

result = sensitivity_no_anticipation(
    data, y='outcome', ivar='unit', tvar='year',
    gvar='first_treat', max_anticipation=3
)

if result.anticipation_detected:
    print(f"Consider excluding {result.recommended_exclusion} periods")

The function:

  1. Estimates ATT excluding 0, 1, 2, … periods before treatment

  2. Detects significant changes in estimates

  3. Recommends how many periods to exclude

Using exclude_pre_periods Parameter

Based on sensitivity analysis results, periods can be excluded in the main estimation using the exclude_pre_periods parameter:

from lwdid import lwdid

# Exclude 2 periods immediately before treatment
result = lwdid(
    data, y='outcome', d='d', ivar='unit', tvar='year', post='post',
    rolling='demean', exclude_pre_periods=2
)

This parameter:

  • Excludes the specified number of pre-treatment periods from transformation

  • Addresses potential anticipation effects

  • Implements the robustness check from Lee and Wooldridge (2026)

Comprehensive Sensitivity Analysis

The sensitivity_analysis() function provides a comprehensive assessment:

from lwdid import sensitivity_analysis

result = sensitivity_analysis(
    data, y='outcome', ivar='unit', tvar='year',
    gvar='first_treat',
    analyses=['pre_periods', 'anticipation']
)

print(result.summary())
result.plot_all()

This function combines multiple sensitivity analyses and provides an overall assessment of estimate robustness.

References

Lee, S. J., and Wooldridge, J. M. (2026). Simple Approaches to Inference with Difference-in-Differences Estimators with Small Cross-Sectional Sample Sizes. Available at SSRN 5325686.

Lee, S. J., and Wooldridge, J. M. (2025). A Simple Transformation Approach to Difference-in-Differences Estimation for Panel Data. Available at SSRN 4516518.

Further Reading