Estimation Module (estimation) =============================== The estimation module implements the core regression and inference procedures for the Lee and Wooldridge difference-in-differences methods. .. automodule:: lwdid.estimation :members: :undoc-members: :show-inheritance: Overview -------- This module provides two main estimation functions: 1. ``estimate_att``: Estimates the average treatment effect on the treated (ATT) 2. ``estimate_period_effects``: Estimates period-specific treatment effects Both functions run OLS regressions on transformed data and compute standard errors using various variance estimators. Estimation Functions -------------------- estimate_att() ~~~~~~~~~~~~~~ **Purpose:** Estimate the overall average treatment effect on the treated (ATT) from the cross-sectional representation of the Lee and Wooldridge estimator. **Regression specification (conceptual):** .. math:: Y_i = \alpha + \tau D_i + Z_i'\beta + \varepsilon_i where: - y_i: Transformed outcome for unit i (typically the post-treatment average of the residualized outcome constructed by the transformation module) - D_i: Treatment indicator (1 = treated, 0 = control) - Z_i: Optional time-invariant controls and their interactions constructed following Lee and Wooldridge (2026) - ε_i: Regression error term **Estimand:** :math:`\tau` is the ATT. **Returns:** - ATT estimate - Standard error - t-statistic - p-value - Confidence interval - Degrees of freedom estimate_period_effects() ~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Estimate treatment effects separately for each post-treatment period using cross-sectional regressions. **Regression specification (for each post-treatment period t):** .. math:: Y_{it} = \alpha_t + \tau_t D_i + Z_i'\beta_t + \varepsilon_{it} where: - y_it: Transformed outcome for unit i in period t - D_i: Treatment indicator (1 = treated, 0 = control) - Z_i: Optional time-invariant controls (and their interactions) re-used from the main regression - :math:`\tau_t`: Treatment effect in period :math:`t` (period-specific ATT) **Returns:** DataFrame with period-specific estimates, standard errors, t-statistics, p-values, and confidence intervals. Variance Estimators ------------------- The module supports multiple variance estimators for different assumptions about the error structure. OLS (Homoskedastic) ~~~~~~~~~~~~~~~~~~~ **Assumption:** Errors are homoskedastic and normally distributed. **Formula:** .. math:: \text{Var}(\hat{\beta}) = \hat{\sigma}^2 (X'X)^{-1} where :math:`\hat{\sigma}^2 = RSS / (n - k)`. **Degrees of freedom:** - Non-clustered: :math:`df = n - k` **When to use:** When homoskedasticity and normality are plausible and exact t-based inference is desired. HC0 (White's Original Estimator) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Assumption:** Errors may be heteroskedastic. **Formula:** .. math:: \text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i x_i x_i' \hat{\varepsilon}_i^2\right) (X'X)^{-1} **Degrees of freedom:** Same as OLS. **When to use:** Large samples with suspected heteroskedasticity. This is the original heteroskedasticity-consistent estimator without finite-sample corrections. HC1 (Heteroskedasticity-Robust) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Assumption:** Errors may be heteroskedastic. **Formula:** .. math:: \text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i x_i x_i' \hat{\varepsilon}_i^2\right) (X'X)^{-1} \times \frac{n}{n-k} **Degrees of freedom:** Same as OLS. **When to use:** Medium to large samples with suspected heteroskedasticity. HC1 applies a degrees-of-freedom correction to HC0. HC2 (Leverage-Adjusted) ~~~~~~~~~~~~~~~~~~~~~~~~ **Assumption:** Errors may be heteroskedastic. **Formula:** .. math:: \text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i \frac{x_i x_i' \hat{\varepsilon}_i^2}{1-h_{ii}}\right) (X'X)^{-1} where :math:`h_{ii}` is the i-th diagonal element of the hat matrix :math:`H = X(X'X)^{-1}X'`. **Degrees of freedom:** Same as OLS. **When to use:** Small to moderate samples with suspected heteroskedasticity and varying leverage across observations. HC3 (Small-Sample Adjusted) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Assumption:** Errors may be heteroskedastic. **Formula:** .. math:: \text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i \frac{x_i x_i' \hat{\varepsilon}_i^2}{(1-h_{ii})^2}\right) (X'X)^{-1} where :math:`h_{ii}` is the i-th diagonal element of the hat matrix. **Degrees of freedom:** Same as OLS. **When to use:** Small or moderate samples with suspected heteroskedasticity. Simulation evidence suggests HC3 can perform reasonably well in some small-sample designs, but results can still be sensitive when the number of treated or control units is very small. See :doc:`../methodological_notes` for further discussion. HC4 (High-Leverage Adjusted) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Assumption:** Errors may be heteroskedastic. **Formula:** .. math:: \text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_i \frac{x_i x_i' \hat{\varepsilon}_i^2}{(1-h_{ii})^{\delta_i}}\right) (X'X)^{-1} where :math:`\delta_i = \min(4, n h_{ii}/\sum_j h_{jj})` is an adaptive exponent based on leverage. **Degrees of freedom:** Same as OLS. **When to use:** When data contains high-leverage observations. HC4 provides adaptive adjustment that increases for observations with high leverage. Variance Estimator Selection Guide ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **OLS**: Sample size: Any. Assumes homoskedasticity and normality. Use for exact t-inference under CLM assumptions. - **HC0**: Sample size: Large (N > 100). Assumes heteroskedasticity. Use for large samples only; understates SE in small samples. - **HC1**: Sample size: Moderate-Large (N > 50). Assumes heteroskedasticity. General robust SE; df-corrected HC0. - **HC2**: Sample size: Small-Moderate (N = 20-100). Assumes heteroskedasticity with varying leverage. Use when leverage varies across observations. - **HC3**: Sample size: Small-Moderate (N = 10-50). Assumes heteroskedasticity. Default for small samples with suspected heteroskedasticity. - **HC4**: Sample size: Any. Assumes heteroskedasticity with influential points. Use when high-leverage points are present in data. - **Cluster**: Sample size: G > 30 clusters. Assumes within-cluster correlation. Use for clustered data (states, schools). **Selection guidelines:** 1. **CLM assumptions plausible + exact inference needed**: Use OLS (``vce=None``) 2. **Heteroskedasticity suspected + small sample**: Use HC3 (``vce='hc3'``) 3. **Heteroskedasticity suspected + moderate/large sample**: Use HC1 (``vce='hc1'``) 4. **High-leverage observations present**: Use HC4 (``vce='hc4'``) 5. **Clustered errors**: Use cluster-robust (``vce='cluster'``) Cluster-Robust ~~~~~~~~~~~~~~ **Assumption:** Errors are correlated within clusters but independent across clusters. **Formula:** .. math:: \text{Var}(\hat{\beta}) = (X'X)^{-1} \left(\sum_g X_g' \hat{\varepsilon}_g \hat{\varepsilon}_g' X_g\right) (X'X)^{-1} where :math:`g` indexes clusters. **Degrees of freedom:** G - 1 (number of clusters minus 1). **When to use:** Errors are clustered (e.g., students within schools). Implementation Details ---------------------- Regression Procedure ~~~~~~~~~~~~~~~~~~~~ 1. **Prepare design matrix:** For the main regression, construct a cross-sectional design matrix with an intercept, the treatment indicator, and (when applicable) time-invariant controls and their interactions with treatment as in Lee and Wooldridge (2026). The same control specification is reused for period-by-period regressions. 2. **Run OLS:** Compute :math:`\hat{\beta} = (X'X)^{-1} X'y` 3. **Compute residuals:** :math:`\hat{\varepsilon} = y - X\hat{\beta}` 4. **Compute variance:** Use appropriate variance estimator 5. **Inference:** Compute t-statistics and p-values using t-distribution Degrees of Freedom ~~~~~~~~~~~~~~~~~~ **Non-clustered:** df = n - k where: - n: Number of observations - k: Number of parameters (intercept + regressors) **Clustered:** df = G - 1 where G is the number of clusters. **Rationale:** With cluster-robust SEs, the effective sample size is the number of clusters, not observations. Confidence Intervals ~~~~~~~~~~~~~~~~~~~~ 95% confidence intervals are computed as: .. math:: CI = \hat{\beta} \pm t_{\alpha/2, df} \times SE(\hat{\beta}) where :math:`t_{\alpha/2, df}` is the critical value from the t-distribution with :math:`df` degrees of freedom. Technical Notes --------------- Numerical Stability ~~~~~~~~~~~~~~~~~~~ The module relies on the numerically stable OLS implementation in ``statsmodels``: - OLS estimation and variance–covariance computation are delegated to ``statsmodels`` - Robust variance estimators with small-sample adjustments (HC0-HC4) - Singular or ill-conditioned designs raise errors or warnings rather than failing silently Missing Data ~~~~~~~~~~~~ - Observations with missing values in required variables (outcome, treatment indicator, unit identifier, time variables, post indicator) are dropped during validation. - For control variables, missing values are handled at the estimation stage: if dropping observations with missing controls still leaves enough treated and control units to satisfy the :math:`N_1 > K+1` and :math:`N_0 > K+1` conditions, those observations are removed and controls are included; otherwise, controls are omitted and the full regression sample is retained. In both cases, informative warnings are issued. - The effective sample size (n) reported in the results corresponds to the cross-sectional regression sample used for ATT estimation (the ``firstpost`` cross-section). Perfect Collinearity ~~~~~~~~~~~~~~~~~~~~~ Regressors must not be perfectly collinear for the OLS problem to be identified. Common sources of exact collinearity include: - Including both a variable and its exact linear transformation - Including dummy variables for all categories (no omitted category) - Including controls that are exact linear combinations of other regressors Example Usage ------------- These functions are used internally by :func:`lwdid.lwdid` after the transformation step has constructed the transformed outcomes and main regression sample. They are not part of the typical user-facing API. For most applications, :func:`lwdid.lwdid` should be called directly, relying on its high-level interface. Advanced users who need low-level access can consult the docstrings and source code in :mod:`lwdid.estimation` to see the exact function signatures and required inputs. Large-Sample Inference ---------------------- For large cross-sectional samples, asymptotic inference using robust standard errors is appropriate: **HC0-HC4 Standard Errors** - HC1 provides heteroskedasticity-consistent standard errors with a degrees-of- freedom correction (n/(n-k)) - HC3 adds leverage-based adjustments suitable for smaller samples **Cluster-Robust Standard Errors** When errors are correlated within clusters (e.g., states, regions), cluster- robust standard errors account for this correlation. Inference relies on asymptotic approximations that improve with the number of clusters G. **Variance Estimator Recommendation** - Small samples with CLM assumptions: Use ``vce=None`` for exact t-based inference - Small to moderate samples with heteroskedasticity: Use ``vce='hc3'`` - Moderate to large samples: Use ``vce='hc1'`` or ``vce='robust'`` - Clustered data with many clusters: Use ``vce='cluster'`` Estimator Selection for Large Samples ------------------------------------- Lee and Wooldridge (2025) establishes theoretical and simulation evidence for choosing among estimators when sample sizes are large enough for asymptotic inference. The key Monte Carlo simulation results inform the following recommendations. **Available Estimators** - **RA (Regression Adjustment)**: Default estimator. Under correct specification of the outcome model, RA is both best linear unbiased (BLUE) and asymptotically efficient. Equivalent to POLS/ETWFE in the common timing case. - **IPW (Inverse Probability Weighting)**: Consistent when the propensity score model is correctly specified. May be less efficient than RA under correct outcome model specification. - **IPWRA (Doubly Robust)**: Combines regression adjustment with propensity score weighting. Consistent if either the outcome model or propensity score model is correctly specified (double robustness property). **Doubly Robust ATT Estimator (Mathematical Formulation)** The influence function representation of the doubly robust ATT estimator is: .. math:: \psi_i^{DR} = \frac{D_i}{\pi} \left[ Y_i - m_0(X_i) \right] - \frac{(1-D_i) p(X_i)}{\pi (1-p(X_i))} \left[ Y_i - m_0(X_i) \right] - \tau_{ATT} where :math:`\pi = P(D=1)` is the unconditional treatment probability, :math:`p(X_i) = P(D=1|X_i)` is the propensity score, and :math:`m_0(X_i) = E[Y|D=0, X_i]` is the conditional mean for controls. **Double robustness property**: - If :math:`m_0(X)` is correctly specified: ATT is identified via regression adjustment regardless of propensity score specification - If :math:`p(X)` is correctly specified: IPW component corrects for outcome model misspecification - Both correct: achieves semiparametric efficiency bound - **PSM (Propensity Score Matching)**: Matches treated to control units based on estimated propensity scores. Generally less efficient than RA and IPWRA. **Efficiency Comparison** (from Lee and Wooldridge 2025) Under correct specification of all models: 1. RA/POLS has the smallest standard deviation and RMSE 2. IPWRA performs close to RA (within 3-5% higher SD in most scenarios) 3. PSM and long differencing methods have notably larger standard deviations (25-40% higher) Under model misspecification: 1. When outcome model is misspecified but propensity score is correct: IPWRA has smallest RMSE due to its double robustness 2. When both models are misspecified: IPWRA still tends to have smaller bias than RA while maintaining reasonable precision **Estimator Selection Guidelines** 1. **Start with RA** when N is moderate and functional form assumptions are plausible. RA is efficient and computationally simple. 2. **Use IPWRA** as primary estimator when: - Functional form assumptions are uncertain - Robustness to model misspecification is desired - Controls are available for both outcome and propensity score models 3. **Use IPW/PSM** when: - Propensity score weighting or matching is preferred for substantive reasons - Comparison with other treatment effects literature is desired 4. **Report multiple estimators** for robustness: If RA and IPWRA give similar results, this provides evidence that findings are not sensitive to functional form assumptions. **Relationship to Long Differencing Approaches** Long differencing approaches use only the period just prior to intervention, discarding information from earlier pre-treatment periods. Lee and Wooldridge (2025) shows that this can result in substantial efficiency loss. The rolling transformation approach uses all suitable pre-treatment periods, achieving efficiency close to POLS while permitting application of doubly robust estimators. See Also -------- - :func:`lwdid.lwdid` - Main function that calls these estimation functions - :doc:`transformations` - Transformation functions applied before estimation - :doc:`../methodological_notes` - Theoretical background on inference - :doc:`../user_guide` - Comprehensive usage guide