Transformations Module

Unit-specific panel data transformations for difference-in-differences.

This module implements unit-specific outcome transformations that remove pre-treatment heterogeneity from panel data. Transformation parameters are estimated using only pre-treatment observations, then applied out-of-sample to all periods including post-treatment.

The transformations convert panel difference-in-differences estimation into cross-sectional treatment effects problems, enabling the application of various estimators (regression adjustment, inverse probability weighting, doubly robust, matching) to the transformed outcomes.

Unit-specific panel data transformations for difference-in-differences.

This module implements unit-specific outcome transformations that remove pre-treatment heterogeneity from panel data. Transformation parameters are estimated using only pre-treatment observations, then applied out-of-sample to all periods including post-treatment.

The transformations convert panel difference-in-differences estimation into cross-sectional treatment effects problems. Under no anticipation and parallel trends assumptions, standard treatment effect estimators (regression adjustment, inverse probability weighting, doubly robust, matching) can be applied to the transformed outcomes.

Available Transformations

demean

Removes unit-specific pre-treatment mean:

\[\dot{Y}_{it} = Y_{it} - \bar{Y}_{i,pre}\]

where \(\bar{Y}_{i,pre} = T_0^{-1} \sum_{s<g} Y_{is}\). Requires at least 1 pre-treatment period per unit.

detrend

Removes unit-specific linear time trend:

\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t\]

where \((\hat{\alpha}_i, \hat{\beta}_i)\) are OLS estimates from pre-treatment data. Requires at least 2 pre-treatment periods per unit.

demeanq

Removes unit-specific mean with quarterly seasonal fixed effects:

\[\dot{Y}_{it} = Y_{it} - \hat{\mu}_i - \sum_{q=2}^{4} \hat{\gamma}_q D_q\]

where \(D_q\) are quarter dummies with the smallest observed quarter as reference category. Requires \(n_{pre} \geq Q + 1\) per unit.

detrendq

Removes unit-specific linear trend with quarterly seasonal effects:

\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t - \sum_{q=2}^{4} \hat{\gamma}_q D_q\]

Requires \(n_{pre} \geq Q + 2\) per unit.

Notes

The transformations eliminate unit-specific level differences or trends that may be correlated with treatment assignment. By removing these pre-treatment patterns, the parallel trends assumption becomes an assumption about the transformed outcomes rather than the original levels.

Time centering is applied in detrending methods to improve numerical stability of OLS estimation. This reduces the condition number of the design matrix without affecting the final residuals, as centering is an affine transformation that preserves predicted values.

Available Transformations

demean

Removes unit-specific pre-treatment mean:

\[\dot{Y}_{it} = Y_{it} - \bar{Y}_{i,pre}\]

where \(\bar{Y}_{i,pre} = T_0^{-1} \sum_{s<g} Y_{is}\).

Requirements: At least 1 pre-treatment period per unit.

Use case: Standard parallel trends assumption where treatment and control groups have parallel outcome paths in levels.

detrend

Removes unit-specific linear time trend:

\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t\]

where \((\hat{\alpha}_i, \hat{\beta}_i)\) are OLS estimates from pre-treatment data.

Requirements: At least 2 pre-treatment periods per unit.

Use case: Heterogeneous linear trends where treatment and control units may follow different linear growth paths.

demeanq

Removes unit-specific mean with seasonal fixed effects:

\[\dot{Y}_{it} = Y_{it} - \hat{\mu}_i - \sum_{q=2}^{Q} \hat{\gamma}_q D_q\]

where \(D_q\) are seasonal dummies with the smallest observed season as reference category and \(Q\) is the number of seasons per cycle.

Supported seasonal periods: Q=4 (quarterly, default), Q=12 (monthly), Q=52 (weekly).

Requirements: At least Q+1 pre-treatment observations per unit. Each season appearing in the post-treatment period must also appear in the pre-treatment period.

Use case: Periodic data with seasonal patterns.

detrendq

Removes unit-specific linear trend with seasonal effects:

\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t - \sum_{q=2}^{Q} \hat{\gamma}_q D_q\]

Supported seasonal periods: Q=4 (quarterly, default), Q=12 (monthly), Q=52 (weekly).

Requirements: At least Q+2 pre-treatment observations per unit. Each season appearing in the post-treatment period must also appear in the pre-treatment period.

Use case: Periodic data with both trends and seasonal patterns.

Main Function

lwdid.transformations.apply_rolling_transform(data, y, ivar, tindex, post, rolling, tpost1, quarter=None, season_var=None, Q=4, exclude_pre_periods=0)[source]

Apply unit-specific transformation to panel data.

Dispatches to the appropriate transformation method (demean, detrend, demeanq, or detrendq), computes post-treatment averages, and marks the cross-sectional regression sample.

Transformation parameters are estimated from pre-treatment data only, then applied out-of-sample to all observations to obtain residualized outcomes. This out-of-sample application is essential for valid causal inference.

Parameters:
  • data (pd.DataFrame) – Panel data in long format with one row per unit-period observation.

  • y (str) – Column name of outcome variable.

  • ivar (str) – Column name of unit identifier.

  • tindex (str) – Column name of integer-valued time index.

  • post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).

  • rolling ({'demean', 'detrend', 'demeanq', 'detrendq'}) – Transformation method to apply.

  • tpost1 (int) – First post-treatment period index. Identifies the cross-sectional regression sample (firstpost=True for observations at tindex==tpost1).

  • quarter (str, optional) – Column name of quarter indicator (values in {1, 2, 3, 4}). Required for quarterly methods (demeanq, detrendq) when Q=4. Deprecated: use season_var instead for non-quarterly data.

  • season_var (str, optional) – Column name of seasonal indicator variable. Values should be integers from 1 to Q. This is the preferred parameter for seasonal methods. If both quarter and season_var are provided, season_var takes precedence.

  • Q (int, default 4) – Number of seasonal periods per cycle. Common values: - 4: Quarterly data (default) - 12: Monthly data - 52: Weekly data

  • exclude_pre_periods (int, default 0) – Number of pre-treatment periods to exclude immediately before treatment. Used to address potential anticipation effects. When > 0, the last exclude_pre_periods pre-treatment periods are excluded from the sample used for estimating transformation parameters.

Returns:

Copy of input data with added columns:

  • ydot: Transformed (residualized) outcome for each observation.

  • ydot_postavg: Post-treatment average of ydot per unit.

  • firstpost: Boolean indicator for cross-sectional regression sample.

Return type:

pd.DataFrame

Raises:
  • InsufficientPrePeriodsError – If any unit has insufficient pre-treatment observations for the chosen transformation method.

  • ValueError – If rolling method is invalid or seasonal method lacks season_var/quarter.

See also

_demean_transform

Unit-specific demeaning implementation.

_detrend_transform

Unit-specific detrending implementation.

demeanq_unit

Seasonal demeaning for a single unit.

detrendq_unit

Seasonal detrending for a single unit.

Unit-Level Functions

lwdid.transformations.detrend_unit(unit_data, y, tindex, post)[source]

Remove unit-specific linear time trend for a single unit.

Estimates \(Y_{it} = \alpha + \beta t + \varepsilon\) via OLS using pre-treatment observations only, then computes out-of-sample residuals for all periods:

\[\dot{Y}_{it} = Y_{it} - \hat{\alpha} - \hat{\beta} t\]

This transformation removes unit-specific linear trends that may violate the parallel trends assumption in levels.

Parameters:
  • unit_data (pd.DataFrame) – Data for a single unit containing all time periods.

  • y (str) – Column name of outcome variable.

  • tindex (str) – Column name of time index.

  • post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).

Return type:

tuple[ndarray, ndarray]

Returns:

  • yhat_all (ndarray) – Fitted values \(\hat{\alpha} + \hat{\beta} t\) for all periods. Returns NaN array if estimation fails.

  • ydot (ndarray) – Detrended outcomes for all periods. Returns NaN array if estimation fails due to numerical issues.

Notes

Time centering at the pre-treatment mean improves numerical stability by reducing the condition number of \(X'X\). Centering is an affine transformation that preserves predicted values.

See also

_detrend_transform

Apply detrending to all units in panel data.

lwdid.transformations.demeanq_unit(unit_data, y, season_var, post, Q=4)[source]

Remove unit-specific mean with seasonal fixed effects.

Estimates a seasonal mean model using pre-treatment observations:

\[Y_{it} = \mu + \sum_{q=2}^{Q} \gamma_q D_q + \varepsilon_{it}\]

where \(D_q\) are seasonal dummies. The smallest observed season serves as the reference category for identification.

Parameters:
  • unit_data (pd.DataFrame) – Data for a single unit containing all time periods.

  • y (str) – Column name of outcome variable.

  • season_var (str) – Column name of seasonal indicator variable. Values should be integers from 1 to Q representing seasonal periods (e.g., quarters 1-4, months 1-12, or weeks 1-52).

  • post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).

  • Q (int, default 4) – Number of seasonal periods per cycle. Common values: - 4: Quarterly data (default) - 12: Monthly data - 52: Weekly data

Return type:

tuple[ndarray, ndarray]

Returns:

  • yhat_all (ndarray) – Fitted values \(\hat{\mu} + \sum_q \hat{\gamma}_q D_q\) for all periods. Returns NaN array if estimation fails.

  • ydot (ndarray) – Seasonally-adjusted demeaned outcomes for all periods. Returns NaN array if estimation fails due to numerical issues.

Notes

Using observed seasons rather than all Q seasons as categories prevents rank-deficient design matrices when some seasons are absent from pre-treatment data.

The minimum required pre-treatment observations is Q + 1 to ensure at least one residual degree of freedom for OLS estimation.

See also

detrendq_unit

Combines seasonal adjustment with linear trend removal.

lwdid.transformations.detrendq_unit(unit_data, y, tindex, season_var, post, Q=4)[source]

Remove unit-specific linear trend with seasonal fixed effects.

Estimates a combined trend and seasonal model using pre-treatment data:

\[Y_{it} = \alpha + \beta t + \sum_{q=2}^{Q} \gamma_q D_q + \varepsilon_{it}\]

The smallest observed season serves as the reference category. Time is centered at its pre-treatment mean for numerical stability.

Parameters:
  • unit_data (pd.DataFrame) – Data for a single unit containing all time periods.

  • y (str) – Column name of outcome variable.

  • tindex (str) – Column name of time index.

  • season_var (str) – Column name of seasonal indicator variable. Values should be integers from 1 to Q representing seasonal periods (e.g., quarters 1-4, months 1-12, or weeks 1-52).

  • post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).

  • Q (int, default 4) – Number of seasonal periods per cycle. Common values: - 4: Quarterly data (default) - 12: Monthly data - 52: Weekly data

Return type:

tuple[ndarray, ndarray]

Returns:

  • yhat_all (ndarray) – Fitted values for all periods. Returns NaN array if estimation fails.

  • ydot (ndarray) – Seasonally-adjusted detrended outcomes for all periods. Returns NaN array if estimation fails due to numerical issues.

Notes

This transformation combines trend removal and seasonal adjustment, accounting for both unit-specific growth patterns and seasonal cycles. Time centering reduces the condition number of the design matrix without affecting predicted values.

The minimum required pre-treatment observations is Q + 2 to ensure at least one residual degree of freedom for OLS estimation (intercept + slope + Q-1 seasonal dummies = Q+1 parameters).

See also

demeanq_unit

Seasonal adjustment without trend removal.

detrend_unit

Linear trend removal without seasonal adjustment.

Internal Functions

lwdid.transformations._demean_transform(data, y, ivar, post)[source]

Apply unit-specific demeaning transformation to all units.

Computes \(\dot{Y}_{it} = Y_{it} - \bar{Y}_{i,pre}\) for all observations, where \(\bar{Y}_{i,pre}\) is the pre-treatment mean:

\[\bar{Y}_{i,pre} = T_{0i}^{-1} \sum_{t: \text{post}=0} Y_{it}\]

This transformation removes unit-specific level heterogeneity that may be correlated with treatment assignment.

Parameters:
  • data (pd.DataFrame) – Panel data containing outcome and post indicator columns.

  • y (str) – Column name of outcome variable.

  • ivar (str) – Column name of unit identifier.

  • post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).

Returns:

Input data with ydot column containing demeaned outcomes.

Return type:

pd.DataFrame

Raises:

InsufficientPrePeriodsError – If any unit has no pre-treatment observations.

See also

_detrend_transform

Removes unit-specific linear trends.

lwdid.transformations._detrend_transform(data, y, ivar, tindex, post)[source]

Apply unit-specific linear detrending transformation to all units.

For each unit i, estimates a linear trend from pre-treatment data and computes out-of-sample residuals:

\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t\]

This transformation removes unit-specific linear trends that may violate the parallel trends assumption when trends differ across treatment groups.

Parameters:
  • data (pd.DataFrame) – Panel data containing outcome, time index, and post indicator columns.

  • y (str) – Column name of outcome variable.

  • ivar (str) – Column name of unit identifier.

  • tindex (str) – Column name of integer-valued time index.

  • post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).

Returns:

Input data with ydot column containing detrended outcomes.

Return type:

pd.DataFrame

Raises:

InsufficientPrePeriodsError – If any unit has fewer than 2 pre-treatment observations.

See also

detrend_unit

Detrending implementation for a single unit.

_demean_transform

Removes unit-specific means instead of trends.

Example Usage

The transformation is automatically applied by the main lwdid() function:

from lwdid import lwdid

# Standard demeaning (parallel trends in levels)
results_dm = lwdid(data, y='y', d='d', ivar='i', tvar='t',
                   post='post', rolling='demean')

# Detrending (allows heterogeneous linear trends)
results_dt = lwdid(data, y='y', d='d', ivar='i', tvar='t',
                   post='post', rolling='detrend')

# Quarterly demeaning with seasonality
results_dmq = lwdid(data, y='y', d='d', ivar='i',
                    tvar=['year', 'quarter'],  # Composite time variable
                    post='post', rolling='demeanq')

# Quarterly detrending with seasonality
results_dtq = lwdid(data, y='y', d='d', ivar='i',
                    tvar=['year', 'quarter'],  # Composite time variable
                    post='post', rolling='detrendq')

Transformation Selection Guide

  • demean: Pre-periods required >= 1. No seasonal support. Assumes parallel levels (common trends).

  • detrend: Pre-periods required >= 2. No seasonal support. Allows heterogeneous linear trends.

  • demeanq: Pre-periods required >= Q+1. Seasonal support (Q=4/12/52). Assumes parallel levels (common trends).

  • detrendq: Pre-periods required >= Q+2. Seasonal support (Q=4/12/52). Allows heterogeneous linear trends.

Staggered Adoption Transformations

For staggered adoption designs, transformations are applied separately for each treatment cohort \(g\), using cohort-specific pre-treatment periods. This approach follows Lee and Wooldridge (2025).

Cohort-Specific Demeaning

For cohort \(g\) (units first treated in period \(g\)) and calendar time \(r \geq g\), the transformed outcome is computed as:

\[\dot{Y}_{irg} = Y_{ir} - \bar{Y}_{i,pre(g)}\]

where the cohort-specific pre-treatment mean is:

\[\bar{Y}_{i,pre(g)} = \frac{1}{g-1} \sum_{s=1}^{g-1} Y_{is}\]

The key difference from common timing demeaning is that the pre-treatment average uses only periods \(\{1, 2, \ldots, g-1\}\) specific to each cohort, rather than a fixed set of pre-treatment periods for all units.

Requirements: For each cohort \(g\), at least \(g - 1 \geq 1\) pre-treatment periods must exist. Early cohorts (e.g., \(g = 2\)) may have limited pre-treatment data.

Cohort-Specific Detrending

For cohort \(g\) and calendar time \(r \geq g\), the detrended outcome is:

\[\ddot{Y}_{irg} = Y_{ir} - \hat{A}_{ig} - \hat{B}_{ig} \cdot r\]

where \((\hat{A}_{ig}, \hat{B}_{ig})\) are estimated from the unit-specific regression using only pre-treatment periods for cohort \(g\):

\[Y_{it} = A_i + B_i \cdot t + \varepsilon_{it}, \quad t \in \{1, 2, \ldots, g-1\}\]

This removes both unit-specific levels and unit-specific linear trends, allowing for heterogeneous trend paths across units while preserving the treatment variation for estimation.

Requirements: For each cohort \(g\), at least \(g - 1 \geq 2\) pre-treatment periods must exist to estimate both intercept and slope.

Staggered Transformation Usage

The staggered transformations are applied automatically when the gvar parameter is specified in lwdid():

from lwdid import lwdid

# Staggered demeaning
results = lwdid(
    data,
    y='outcome',
    ivar='unit',
    tvar='year',
    gvar='first_treat_year',  # Cohort indicator
    rolling='demean'
)

# Staggered detrending
results = lwdid(
    data,
    y='outcome',
    ivar='unit',
    tvar='year',
    gvar='first_treat_year',
    rolling='detrend'
)

Staggered Transformation Selection

In staggered adoption designs, only the demean and detrend transformations are supported through the main lwdid() interface. Seasonal transformations (demeanq, detrendq) are restricted to common timing mode; passing rolling='demeanq' or rolling='detrendq' with gvar raises a ValueError.

  • demean: Staggered support via lwdid() = Yes. Pre-periods required: \(g - 1 \geq 1\) per cohort.

  • detrend: Staggered support via lwdid() = Yes. Pre-periods required: \(g - 1 \geq 2\) per cohort.

  • demeanq: Staggered support via lwdid() = No (common timing only).

  • detrendq: Staggered support via lwdid() = No (common timing only).

Note

The staggered module contains low-level implementations of transform_staggered_demeanq and transform_staggered_detrendq for advanced users, but these are not exposed through the main lwdid() function.

See Also

lwdid.lwdid() : Main estimation function. Methodological Notes : Theoretical foundations of transformations. User Guide : Practical guidance on transformation selection.