Staggered DiD Module (staggered)
================================

The staggered module implements difference-in-differences estimation for settings
with staggered treatment adoption, based on Lee and Wooldridge (2025).

Overview
--------

In staggered settings, different units begin treatment at different times (cohorts).
This module provides:

- **Data transformations**: Cohort-specific demeaning and detrending
- **Control group selection**: Never-treated or not-yet-treated units
- **Effect estimation**: (g,r)-specific, cohort, and overall effects
- **Multiple estimators**: RA (regression adjustment), IPW, IPWRA, PSM
- **Randomization inference**: Bootstrap and permutation tests

Key Concepts
------------

Cohort (g)
    The period when a unit first receives treatment. Units that are never treated
    can be encoded as gvar=0, gvar=NaN, or gvar=inf (all internally mapped to
    :math:`\infty`).

(g, r) Effect
    The treatment effect :math:`\tau_{gr}` for cohort :math:`g` at calendar time
    :math:`r`, where :math:`r \geq g`. This is the ATT for units first treated
    in period :math:`g`, evaluated at time :math:`r`.

Control Group Strategies
    Lee and Wooldridge (2025) establishes that under no anticipation and
    conditional parallel trends, both never-treated and not-yet-treated units
    provide valid counterfactuals:

    - ``never_treated``: Only units with :math:`D_\infty = 1` (never treated during
      observation).
      Required when using ``aggregate='cohort'`` or ``aggregate='overall'``.
    - ``not_yet_treated``: Never-treated plus cohorts h > r (units first treated
      after period r). Uses more control observations, potentially improving
      efficiency.
    - ``all_others``: All units not in the treated cohort, including units that
      were already treated in earlier periods. This option is primarily intended
      for replication and diagnostics; it may introduce forbidden comparisons
      under the no-anticipation assumption.

    The theoretical justification shows that the cohort assignments are
    unconfounded with respect to the transformed potential outcome conditional
    on covariates.

Aggregation Levels
    - ``none``: Returns all :math:`(g, r)`-specific effects
    - ``cohort``: Averages effects within each cohort:
      :math:`\tau_g = \frac{1}{T-g+1} \sum_{r=g}^{T} \tau_{gr}`
    - ``overall``: Cohort-share weighted average:
      :math:`\tau_\omega = \sum_g \omega_g \tau_g` where
      :math:`\omega_g = N_g/N_{treat}`

All Units Eventually Treated
    When no units remain untreated through period :math:`T` (no never-treated
    group), treatment effects are defined relative to :math:`Y_t(T)` instead of
    :math:`Y_t(\infty)`. Effects can be estimated for cohorts
    :math:`g \in \{S, \ldots, T-1\}`; the final cohort (:math:`g = T`) serves as
    the control for all earlier cohorts in period :math:`T`. See
    :doc:`../methodological_notes` for theoretical details.

Transformations
---------------

.. autofunction:: lwdid.staggered.transform_staggered_demean

.. autofunction:: lwdid.staggered.transform_staggered_detrend

.. autofunction:: lwdid.staggered.transform_staggered_demeanq

.. autofunction:: lwdid.staggered.transform_staggered_detrendq

.. note::

   All four transformation methods (``demean``, ``detrend``, ``demeanq``,
   ``detrendq``) are available through the main ``lwdid()`` function in
   staggered mode. The seasonal transformations (``demeanq``, ``detrendq``)
   require the ``season_var`` and ``Q`` parameters.

.. autofunction:: lwdid.staggered.get_cohorts

.. autofunction:: lwdid.staggered.get_valid_periods_for_cohort

Control Groups
--------------

.. autoclass:: lwdid.staggered.ControlGroupStrategy
   :members:
   :undoc-members:
   :no-index:

.. autofunction:: lwdid.staggered.get_valid_control_units

.. autofunction:: lwdid.staggered.get_all_control_masks

.. autofunction:: lwdid.staggered.get_all_control_masks_pre

.. autofunction:: lwdid.staggered.validate_control_group

.. autofunction:: lwdid.staggered.identify_never_treated_units

.. autofunction:: lwdid.staggered.has_never_treated_units

.. autofunction:: lwdid.staggered.count_control_units_by_strategy

Estimation
----------

.. autoclass:: lwdid.staggered.CohortTimeEffect
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.estimate_cohort_time_effects

.. autofunction:: lwdid.staggered.run_ols_regression

.. autofunction:: lwdid.staggered.results_to_dataframe

Aggregation
-----------

.. autoclass:: lwdid.staggered.CohortEffect
   :members:
   :no-index:

.. autoclass:: lwdid.staggered.OverallEffect
   :members:
   :no-index:

.. autoclass:: lwdid.staggered.EventTimeEffect
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.aggregate_to_cohort

.. autofunction:: lwdid.staggered.aggregate_to_overall

.. autofunction:: lwdid.staggered.aggregate_to_event_time

.. autofunction:: lwdid.staggered.construct_aggregated_outcome

.. autofunction:: lwdid.staggered.cohort_effects_to_dataframe

IPW Estimator
-------------

Inverse probability weighting estimates treatment effects by weighting observations
based on their propensity scores.

.. autoclass:: lwdid.staggered.IPWResult
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.estimate_ipw

IPWRA Estimator
---------------

The doubly robust IPWRA estimator combines regression adjustment and inverse
probability weighting, providing consistent estimates when either the outcome
model or the propensity score model is correctly specified.

.. autoclass:: lwdid.staggered.IPWRAResult
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.estimate_ipwra

.. autofunction:: lwdid.staggered.estimate_propensity_score

.. autofunction:: lwdid.staggered.estimate_outcome_model

PSM Estimator
-------------

.. autoclass:: lwdid.staggered.PSMResult
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.estimate_psm

Inference Distribution by Estimator
-----------------------------------

The reference distribution used for constructing confidence intervals and
computing p-values varies by estimator. The following table summarizes the
inference approach for each estimator in the staggered module:

**Summary Table:**

- **RA (Regression Adjustment)**: t-distribution with df = N_treated + N_control - k
- **IPW (Inverse Probability Weighting)**: Normal distribution (asymptotic inference)
- **IPWRA (Doubly Robust)**: Normal distribution (asymptotic inference)
- **PSM (Propensity Score Matching)**: Normal distribution (asymptotic inference)

**Detailed Explanation:**

RA (Regression Adjustment)
~~~~~~~~~~~~~~~~~~~~~~~~~~

The regression adjustment estimator uses the t-distribution for inference,
following Lee and Wooldridge (2026). Under classical linear model
assumptions (normality and homoskedasticity), this provides exact finite-sample
inference. With heteroskedasticity-robust standard errors (HC0-HC4), the
t-distribution provides a conservative approximation that improves upon normal
approximations in small samples.

IPW, IPWRA, and PSM
~~~~~~~~~~~~~~~~~~~

The IPW, IPWRA, and PSM estimators use the normal distribution for asymptotic
inference because:

1. These estimators rely on influence function-based variance estimation
2. Asymptotic theory justifies normal approximations for these methods
3. For large samples, the normal distribution provides valid inference

For small samples where exact inference is desired, consider using RA with
``vce=None`` instead of IPW/IPWRA/PSM.

**Practical Recommendations:**

1. **Small samples** (N < 50): Use RA with ``vce=None`` for exact t-based
   inference, or use randomization inference (``ri=True``) for assumption-free
   testing.

2. **Moderate samples** (50 ≤ N < 200): Use RA or IPWRA with HC3 standard errors
   (``vce='hc3'``).

3. **Large samples** (N ≥ 200): All estimators with asymptotic inference are
   appropriate; IPWRA is recommended when functional form assumptions are
   uncertain due to its double robustness property.

See :doc:`../methodological_notes` for detailed theoretical foundations.

Randomization Inference
-----------------------

.. autoclass:: lwdid.staggered.StaggeredRIResult
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.randomization_inference_staggered

.. autofunction:: lwdid.staggered.ri_overall_effect

.. autofunction:: lwdid.staggered.ri_cohort_effect

Pre-treatment Dynamics
----------------------

The pre-treatment dynamics module implements estimation and testing for
pre-treatment periods, following Lee and Wooldridge (2025).

Pre-treatment Transformations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: lwdid.staggered.transform_staggered_demean_pre

.. autofunction:: lwdid.staggered.transform_staggered_detrend_pre

.. autofunction:: lwdid.staggered.get_pre_treatment_periods_for_cohort

Pre-treatment Estimation
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: lwdid.staggered.PreTreatmentEffect
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.estimate_pre_treatment_effects

.. autofunction:: lwdid.staggered.pre_treatment_effects_to_dataframe

Parallel Trends Testing
~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: lwdid.staggered.ParallelTrendsTestResult
   :members:
   :no-index:

.. autofunction:: lwdid.staggered.run_parallel_trends_test

.. autofunction:: lwdid.staggered.summarize_parallel_trends_test

Examples
--------

Basic Staggered Estimation
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   from lwdid import lwdid
   
   # Load Castle Law data
   data = pd.read_csv('castle.csv')
   
   # Create gvar from effyear (NaN = never treated -> 0)
   data['gvar'] = data['effyear'].fillna(0).astype(int)
   
   # Run staggered DiD
   results = lwdid(
       data=data,
       y='lhomicide',        # Log homicide rate
       ivar='sid',           # State ID (integer)
       tvar='year',          # Year
       gvar='gvar',          # First treatment year
       rolling='demean',     # Demeaning transformation
       control_group='never_treated',  # Use only never-treated as controls
       aggregate='overall',  # Get overall weighted effect
       vce='hc3'            # HC3 standard errors
   )
   
   # View results
   print(results.summary())
   print(f"Overall ATT: {results.att_overall:.4f}")
   print(f"SE: {results.se_overall:.4f}")

Cohort-Specific Effects
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Get cohort-specific effects
   results = lwdid(
       data=data,
       y='lhomicide',
       ivar='sid',
       tvar='year',
       gvar='gvar',
       rolling='demean',
       aggregate='cohort',   # Aggregate within cohorts
   )
   
   # Access cohort effects
   print(results.att_by_cohort)
   # Columns: cohort, att, se, ci_lower, ci_upper, t_stat, pvalue, n_units, n_periods

All (g, r) Effects
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Get all cohort-time specific effects
   results = lwdid(
       data=data,
       y='lhomicide',
       ivar='sid',
       tvar='year',
       gvar='gvar',
       rolling='demean',
       aggregate='none',     # No aggregation
   )
   
   # Access all (g,r) effects
   print(results.att_by_cohort_time)
   # Columns: cohort, period, event_time, att, se, ci_lower, ci_upper, t_stat, pvalue, n_treated, n_control

Event Study Plot
~~~~~~~~~~~~~~~~

.. code-block:: python

   # Generate event study plot
   results = lwdid(
       data=data,
       y='lhomicide',
       ivar='sid',
       tvar='year',
       gvar='gvar',
       aggregate='none',
   )
   
   # Plot event study
   fig = results.plot_event_study(
       title='Castle Doctrine Effect',
       ylabel='Effect on Log Homicide Rate'
   )
   fig.savefig('event_study.png', dpi=300)

Event Time Aggregation (WATT)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lwdid.staggered import aggregate_to_event_time
   
   # Get all cohort-time specific effects
   results = lwdid(
       data=data,
       y='lhomicide',
       ivar='sid',
       tvar='year',
       gvar='gvar',
       aggregate='none',
   )
   
   # Aggregate to event time using WATT (Weighted ATT)
   # WATT(r) = Σ w(g,r) × ATT(g, g+r) where w(g,r) = N_g / Σ N_g'
   watt_effects = aggregate_to_event_time(
       cohort_time_effects=results.att_by_cohort_time,
       cohort_sizes=results.cohort_sizes,
       alpha=0.05,
       df_strategy='conservative',  # Use min(df) across cohorts
   )
   
   # Access event-time aggregated effects
   for e in watt_effects:
       print(f"Event time {e.event_time}: WATT={e.att:.4f}, SE={e.se:.4f}, "
             f"CI=[{e.ci_lower:.4f}, {e.ci_upper:.4f}], p={e.pvalue:.4f}")
   
   # Or use plot_event_study with weighted aggregation
   fig, ax = results.plot_event_study(
       aggregation='weighted',  # Use WATT aggregation
       title='Event Study with WATT Aggregation',
   )

Pre-treatment Dynamics and Parallel Trends Testing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Estimate with pre-treatment dynamics for parallel trends assessment
   results = lwdid(
       data=data,
       y='lhomicide',
       ivar='sid',
       tvar='year',
       gvar='gvar',
       rolling='demean',
       aggregate='cohort',
       include_pretreatment=True,    # Enable pre-treatment estimation
       pretreatment_test=True,       # Run parallel trends test
       pretreatment_alpha=0.05,      # Significance level
   )
   
   # View summary with pre-treatment results
   print(results.summary())
   
   # Access pre-treatment ATT estimates
   print(results.att_pre_treatment)
   # Columns: cohort, period, event_time, att, se, ci_lower, ci_upper,
   #          t_stat, pvalue, n_treated, n_control, is_anchor, rolling_window_size
   
   # Access parallel trends test results
   pt = results.parallel_trends_test
   print(f"Joint F-stat: {pt.joint_f_stat:.4f}")
   print(f"P-value: {pt.joint_pvalue:.4f}")
   print(f"Reject H0: {pt.reject_null}")
   
   # Plot event study with pre-treatment effects
   fig, ax = results.plot_event_study(
       include_pre_treatment=True,
       title='Event Study with Pre-treatment Effects',
       pre_treatment_color='gray',
       post_treatment_color='blue',
   )
   fig.savefig('event_study_pretreatment.png', dpi=300)

Low-Level API Usage
~~~~~~~~~~~~~~~~~~~

For more control, the staggered module can be used directly:

.. code-block:: python

   from lwdid.staggered import (
       transform_staggered_demean,
       estimate_cohort_time_effects,
       aggregate_to_overall,
       ControlGroupStrategy
   )
   
   # Step 1: Transform data
   data_transformed = transform_staggered_demean(
       data=data,
       y='lhomicide',
       ivar='sid',
       tvar='year',
       gvar='gvar'
   )
   
   # Step 2: Estimate (g,r) effects
   effects = estimate_cohort_time_effects(
       data_transformed=data_transformed,
       gvar='gvar',
       ivar='sid',
       tvar='year',
       control_strategy='never_treated',
       vce='hc3'
   )
   
   # Step 3: Aggregate to overall effect
   overall = aggregate_to_overall(
       data_transformed=data_transformed,
       gvar='gvar',
       ivar='sid',
       tvar='year',
       cohorts=[2005, 2006, 2007, 2008, 2009],
       T_max=2010,
       vce='hc3'
   )
   
   print(f"Overall ATT: {overall.att:.4f} (SE: {overall.se:.4f})")

See Also
--------

- :func:`lwdid.lwdid` - Main estimation function with staggered support
- :doc:`../user_guide` - Comprehensive usage guide
- :doc:`../methodological_notes` - Theoretical foundations
- :doc:`../examples/index` - Examples including Castle Law analysis