Core Module (core) ================== Main entry point for difference-in-differences estimation with panel data. This module provides the main user-facing function ``lwdid()`` for estimating average treatment effects on the treated (ATT) using the Lee and Wooldridge transformation-based approach. The function supports both common timing designs (where all treated units receive treatment simultaneously) and staggered adoption designs (where units are treated at different times). The method transforms panel data via unit-specific time-series operations (demeaning or detrending), then applies cross-sectional regression to the transformed outcomes. Under classical linear model assumptions, exact t-based inference is available for small samples. For large samples, asymptotic inference with robust standard errors or doubly robust estimators (IPW, IPWRA, PSM) is supported. Main Function ------------- .. autofunction:: lwdid.lwdid Examples -------- Basic Usage ~~~~~~~~~~~ Simplest DiD estimation with default settings: .. code-block:: python from lwdid import lwdid import pandas as pd # Load data data = pd.read_csv('smoking.csv') # Run estimation results = lwdid( data, y='lcigsale', # Outcome variable d='d', # Treatment indicator ivar='state', # Unit ID tvar='year', # Time variable post='post', # Post-treatment indicator rolling='demean' # Transformation method ) # View results print(results.summary()) print(f"ATT: {results.att:.4f}") print(f"SE: {results.se_att:.4f}") print(f"p-value: {results.pvalue:.4f}") With Robust Standard Errors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using HC3 heteroskedasticity-robust standard errors: .. code-block:: python results = lwdid( data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend', vce='hc3' # HC3 robust standard errors ) With Randomization Inference ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adding randomization inference for non-parametric testing: .. code-block:: python results = lwdid( data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean', ri=True, # Enable randomization inference rireps=1000, # Number of permutations ri_method='permutation', # Use permutation (recommended) seed=42 # For reproducibility ) print(f"t-based p-value: {results.pvalue:.4f}") print(f"RI p-value: {results.ri_pvalue:.4f}") With Control Variables ~~~~~~~~~~~~~~~~~~~~~~~ Including time-invariant control variables: .. code-block:: python # Prepare time-invariant controls # Use pre-treatment mean for time-varying variables data_prep = data.copy() for var in ['retprice', 'beer']: pre_mean = data[data['post']==0].groupby('state')[var].mean() data_prep[f'{var}_pre'] = data_prep['state'].map(pre_mean) results = lwdid( data_prep, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend', controls=['retprice_pre', 'beer_pre'], # Time-invariant controls vce='hc3' ) .. warning:: Control variables must be time-invariant (constant within each unit). Time-varying variables must first be aggregated to create unit-level constants. Common approaches: - Pre-treatment mean: ``data[data['post']==0].groupby('unit')[var].mean()`` - First period value: ``data.groupby('unit')[var].first()`` - Overall mean: ``data.groupby('unit')[var].mean()`` Cluster-Robust Standard Errors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When errors are correlated within clusters: .. code-block:: python results = lwdid( data, 'outcome', 'd', 'unit', 'year', 'post', 'demean', vce='cluster', cluster_var='state' # Cluster by state ) Quarterly Data ~~~~~~~~~~~~~~ Handling quarterly data with seasonal patterns: .. code-block:: python # Data has columns: unit, year, quarter, outcome, d, post results = lwdid( data_q, y='sales', d='d', ivar='store', tvar=['year', 'quarter'], # Composite time variable post='post', rolling='detrendq' # Quarterly detrending with seasonality ) Complete Example with All Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python results = lwdid( data, y='outcome', d='d', ivar='unit', tvar='year', post='post_treatment', rolling='detrend', controls=['baseline_x1', 'baseline_x2'], vce='hc3', ri=True, rireps=2000, ri_method='permutation', seed=12345 ) # Access results print(f"ATT: {results.att:.3f} ({results.ci_lower:.3f}, {results.ci_upper:.3f})") print(f"t-stat: {results.t_stat:.3f}, p-value: {results.pvalue:.4f}") print(f"RI p-value: {results.ri_pvalue:.4f}") print(f"N = {results.nobs}, df = {results.df_inference}") # Export results results.to_excel('results.xlsx') results.plot() Doubly Robust Estimation (Large-Sample Common Timing) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For large cross-sectional samples in common timing designs, multiple estimators are available beyond regression adjustment (RA). Lee and Wooldridge (2025) shows that the rolling transformation approach enables application of doubly robust estimators. .. code-block:: python # IPWRA (doubly robust) estimator results = lwdid( data, 'outcome', 'd', 'unit', 'year', 'post', 'demean', estimator='ipwra', # Doubly robust estimator controls=['x1', 'x2'], # Controls for outcome and propensity score vce='hc3' ) # IPW (inverse probability weighting) estimator results_ipw = lwdid( data, 'outcome', 'd', 'unit', 'year', 'post', 'demean', estimator='ipw', controls=['x1', 'x2'], vce='hc3' ) # PSM (propensity score matching) estimator results_psm = lwdid( data, 'outcome', 'd', 'unit', 'year', 'post', 'demean', estimator='psm', controls=['x1', 'x2'], n_neighbors=3, # 3 nearest neighbors caliper=0.1 # Caliper in PS standard deviations ) .. note:: **Estimator Selection Guidelines:** - **RA** (default): Efficient under correct outcome model specification. - **IPWRA**: Doubly robust; consistent if either outcome or propensity score model is correctly specified. Recommended when functional form is uncertain. - **IPW/PSM**: Use when propensity score methods are preferred for substantive reasons. For detailed guidance on estimator selection, see :doc:`estimation` and :doc:`../user_guide`. Staggered DiD ~~~~~~~~~~~~~ For staggered treatment adoption (different units treated at different times): .. code-block:: python # Castle Law example - states adopted Castle Doctrine at different years data = pd.read_csv('castle.csv') data['gvar'] = data['effyear'].fillna(0).astype(int) # Overall effect across all cohorts results = lwdid( data=data, y='lhomicide', # Log homicide rate ivar='sid', # State ID (must be integer) tvar='year', # Year gvar='gvar', # First treatment year (0 = never treated) rolling='demean', # Transformation method control_group='never_treated', aggregate='overall', # Get weighted average effect vce='hc3' ) print(f"Overall ATT: {results.att_overall:.4f}") print(f"95% CI: [{results.ci_overall_lower:.4f}, {results.ci_overall_upper:.4f}]") .. code-block:: python # Cohort-specific effects results = lwdid( data=data, y='lhomicide', ivar='sid', tvar='year', gvar='gvar', aggregate='cohort', # Average within each cohort ) print(results.att_by_cohort) # DataFrame with cohort-level estimates .. code-block:: python # All (cohort, time) specific effects for event study results = lwdid( data=data, y='lhomicide', ivar='sid', tvar='year', gvar='gvar', aggregate='none', # No aggregation ) # Plot event study results.plot_event_study(title='Castle Doctrine Effect') .. note:: When using ``aggregate='cohort'`` or ``aggregate='overall'``, the control group is automatically switched to ``never_treated`` if ``not_yet_treated`` was specified. This is required by the theoretical framework (see Lee and Wooldridge, 2025). Pre-treatment Dynamics and Parallel Trends Testing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For parallel trends assessment in staggered designs: .. code-block:: python results = lwdid( data=data, y='lhomicide', ivar='sid', tvar='year', gvar='gvar', rolling='demean', aggregate='cohort', include_pretreatment=True, # Compute pre-treatment ATT pretreatment_test=True, # Run parallel trends test ) # Access parallel trends test results pt = results.parallel_trends_test print(f"Joint F-stat: {pt.joint_f_stat:.4f}, p-value: {pt.joint_pvalue:.4f}") # Plot with pre-treatment effects results.plot_event_study(include_pre_treatment=True) Handling Unbalanced Panels ~~~~~~~~~~~~~~~~~~~~~~~~~~ For panels with missing observations: .. code-block:: python from lwdid import lwdid, diagnose_selection_mechanism # Run selection diagnostics first diagnostics = diagnose_selection_mechanism( data=data, ivar='unit', tvar='year', gvar='gvar' ) print(f"Selection risk: {diagnostics.risk_level}") # Estimation with unbalanced panel handling results = lwdid( data=data, y='outcome', ivar='unit', tvar='year', gvar='gvar', rolling='detrend', # Detrending is more robust to selection balanced_panel='warn' # Issue warning with diagnostics (default) ) Excluding Pre-treatment Periods for Anticipation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When no-anticipation assumption may be violated: .. code-block:: python # Exclude 2 periods before treatment to test for anticipation effects results = lwdid( data=data, y='outcome', d='treated', ivar='unit', tvar='year', post='post', rolling='demean', exclude_pre_periods=2 # Exclude t-1 and t-2 from transformation ) See Also -------- - :class:`lwdid.LWDIDResults` - Results object returned by lwdid() - :mod:`lwdid.staggered` - Low-level staggered estimation functions - :doc:`../user_guide` - Comprehensive usage guide - :doc:`../quickstart` - Quick start tutorial