Transformations Module
Unit-specific panel data transformations for difference-in-differences.
This module implements unit-specific outcome transformations that remove pre-treatment heterogeneity from panel data. Transformation parameters are estimated using only pre-treatment observations, then applied out-of-sample to all periods including post-treatment.
The transformations convert panel difference-in-differences estimation into cross-sectional treatment effects problems, enabling the application of various estimators (regression adjustment, inverse probability weighting, doubly robust, matching) to the transformed outcomes.
Unit-specific panel data transformations for difference-in-differences.
This module implements unit-specific outcome transformations that remove pre-treatment heterogeneity from panel data. Transformation parameters are estimated using only pre-treatment observations, then applied out-of-sample to all periods including post-treatment.
The transformations convert panel difference-in-differences estimation into cross-sectional treatment effects problems. Under no anticipation and parallel trends assumptions, standard treatment effect estimators (regression adjustment, inverse probability weighting, doubly robust, matching) can be applied to the transformed outcomes.
Available Transformations
- demean
Removes unit-specific pre-treatment mean:
\[\dot{Y}_{it} = Y_{it} - \bar{Y}_{i,pre}\]where \(\bar{Y}_{i,pre} = T_0^{-1} \sum_{s<g} Y_{is}\). Requires at least 1 pre-treatment period per unit.
- detrend
Removes unit-specific linear time trend:
\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t\]where \((\hat{\alpha}_i, \hat{\beta}_i)\) are OLS estimates from pre-treatment data. Requires at least 2 pre-treatment periods per unit.
- demeanq
Removes unit-specific mean with quarterly seasonal fixed effects:
\[\dot{Y}_{it} = Y_{it} - \hat{\mu}_i - \sum_{q=2}^{4} \hat{\gamma}_q D_q\]where \(D_q\) are quarter dummies with the smallest observed quarter as reference category. Requires \(n_{pre} \geq Q + 1\) per unit.
- detrendq
Removes unit-specific linear trend with quarterly seasonal effects:
\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t - \sum_{q=2}^{4} \hat{\gamma}_q D_q\]Requires \(n_{pre} \geq Q + 2\) per unit.
Notes
The transformations eliminate unit-specific level differences or trends that may be correlated with treatment assignment. By removing these pre-treatment patterns, the parallel trends assumption becomes an assumption about the transformed outcomes rather than the original levels.
Time centering is applied in detrending methods to improve numerical stability of OLS estimation. This reduces the condition number of the design matrix without affecting the final residuals, as centering is an affine transformation that preserves predicted values.
Available Transformations
demean
Removes unit-specific pre-treatment mean:
where \(\bar{Y}_{i,pre} = T_0^{-1} \sum_{s<g} Y_{is}\).
Requirements: At least 1 pre-treatment period per unit.
Use case: Standard parallel trends assumption where treatment and control groups have parallel outcome paths in levels.
detrend
Removes unit-specific linear time trend:
where \((\hat{\alpha}_i, \hat{\beta}_i)\) are OLS estimates from pre-treatment data.
Requirements: At least 2 pre-treatment periods per unit.
Use case: Heterogeneous linear trends where treatment and control units may follow different linear growth paths.
demeanq
Removes unit-specific mean with seasonal fixed effects:
where \(D_q\) are seasonal dummies with the smallest observed season as reference category and \(Q\) is the number of seasons per cycle.
Supported seasonal periods: Q=4 (quarterly, default), Q=12 (monthly), Q=52 (weekly).
Requirements: At least Q+1 pre-treatment observations per unit. Each season appearing in the post-treatment period must also appear in the pre-treatment period.
Use case: Periodic data with seasonal patterns.
detrendq
Removes unit-specific linear trend with seasonal effects:
Supported seasonal periods: Q=4 (quarterly, default), Q=12 (monthly), Q=52 (weekly).
Requirements: At least Q+2 pre-treatment observations per unit. Each season appearing in the post-treatment period must also appear in the pre-treatment period.
Use case: Periodic data with both trends and seasonal patterns.
Main Function
- lwdid.transformations.apply_rolling_transform(data, y, ivar, tindex, post, rolling, tpost1, quarter=None, season_var=None, Q=4, exclude_pre_periods=0)[source]
Apply unit-specific transformation to panel data.
Dispatches to the appropriate transformation method (demean, detrend, demeanq, or detrendq), computes post-treatment averages, and marks the cross-sectional regression sample.
Transformation parameters are estimated from pre-treatment data only, then applied out-of-sample to all observations to obtain residualized outcomes. This out-of-sample application is essential for valid causal inference.
- Parameters:
data (pd.DataFrame) – Panel data in long format with one row per unit-period observation.
y (str) – Column name of outcome variable.
ivar (str) – Column name of unit identifier.
tindex (str) – Column name of integer-valued time index.
post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).
rolling ({'demean', 'detrend', 'demeanq', 'detrendq'}) – Transformation method to apply.
tpost1 (int) – First post-treatment period index. Identifies the cross-sectional regression sample (firstpost=True for observations at tindex==tpost1).
quarter (str, optional) – Column name of quarter indicator (values in {1, 2, 3, 4}). Required for quarterly methods (demeanq, detrendq) when Q=4. Deprecated: use season_var instead for non-quarterly data.
season_var (str, optional) – Column name of seasonal indicator variable. Values should be integers from 1 to Q. This is the preferred parameter for seasonal methods. If both quarter and season_var are provided, season_var takes precedence.
Q (int, default 4) – Number of seasonal periods per cycle. Common values: - 4: Quarterly data (default) - 12: Monthly data - 52: Weekly data
exclude_pre_periods (int, default 0) – Number of pre-treatment periods to exclude immediately before treatment. Used to address potential anticipation effects. When > 0, the last
exclude_pre_periodspre-treatment periods are excluded from the sample used for estimating transformation parameters.
- Returns:
Copy of input data with added columns:
ydot: Transformed (residualized) outcome for each observation.ydot_postavg: Post-treatment average of ydot per unit.firstpost: Boolean indicator for cross-sectional regression sample.
- Return type:
pd.DataFrame
- Raises:
InsufficientPrePeriodsError – If any unit has insufficient pre-treatment observations for the chosen transformation method.
ValueError – If rolling method is invalid or seasonal method lacks season_var/quarter.
See also
_demean_transformUnit-specific demeaning implementation.
_detrend_transformUnit-specific detrending implementation.
demeanq_unitSeasonal demeaning for a single unit.
detrendq_unitSeasonal detrending for a single unit.
Unit-Level Functions
- lwdid.transformations.detrend_unit(unit_data, y, tindex, post)[source]
Remove unit-specific linear time trend for a single unit.
Estimates \(Y_{it} = \alpha + \beta t + \varepsilon\) via OLS using pre-treatment observations only, then computes out-of-sample residuals for all periods:
\[\dot{Y}_{it} = Y_{it} - \hat{\alpha} - \hat{\beta} t\]This transformation removes unit-specific linear trends that may violate the parallel trends assumption in levels.
- Parameters:
- Return type:
- Returns:
yhat_all (ndarray) – Fitted values \(\hat{\alpha} + \hat{\beta} t\) for all periods. Returns NaN array if estimation fails.
ydot (ndarray) – Detrended outcomes for all periods. Returns NaN array if estimation fails due to numerical issues.
Notes
Time centering at the pre-treatment mean improves numerical stability by reducing the condition number of \(X'X\). Centering is an affine transformation that preserves predicted values.
See also
_detrend_transformApply detrending to all units in panel data.
- lwdid.transformations.demeanq_unit(unit_data, y, season_var, post, Q=4)[source]
Remove unit-specific mean with seasonal fixed effects.
Estimates a seasonal mean model using pre-treatment observations:
\[Y_{it} = \mu + \sum_{q=2}^{Q} \gamma_q D_q + \varepsilon_{it}\]where \(D_q\) are seasonal dummies. The smallest observed season serves as the reference category for identification.
- Parameters:
unit_data (pd.DataFrame) – Data for a single unit containing all time periods.
y (str) – Column name of outcome variable.
season_var (str) – Column name of seasonal indicator variable. Values should be integers from 1 to Q representing seasonal periods (e.g., quarters 1-4, months 1-12, or weeks 1-52).
post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).
Q (int, default 4) – Number of seasonal periods per cycle. Common values: - 4: Quarterly data (default) - 12: Monthly data - 52: Weekly data
- Return type:
- Returns:
yhat_all (ndarray) – Fitted values \(\hat{\mu} + \sum_q \hat{\gamma}_q D_q\) for all periods. Returns NaN array if estimation fails.
ydot (ndarray) – Seasonally-adjusted demeaned outcomes for all periods. Returns NaN array if estimation fails due to numerical issues.
Notes
Using observed seasons rather than all Q seasons as categories prevents rank-deficient design matrices when some seasons are absent from pre-treatment data.
The minimum required pre-treatment observations is Q + 1 to ensure at least one residual degree of freedom for OLS estimation.
See also
detrendq_unitCombines seasonal adjustment with linear trend removal.
- lwdid.transformations.detrendq_unit(unit_data, y, tindex, season_var, post, Q=4)[source]
Remove unit-specific linear trend with seasonal fixed effects.
Estimates a combined trend and seasonal model using pre-treatment data:
\[Y_{it} = \alpha + \beta t + \sum_{q=2}^{Q} \gamma_q D_q + \varepsilon_{it}\]The smallest observed season serves as the reference category. Time is centered at its pre-treatment mean for numerical stability.
- Parameters:
unit_data (pd.DataFrame) – Data for a single unit containing all time periods.
y (str) – Column name of outcome variable.
tindex (str) – Column name of time index.
season_var (str) – Column name of seasonal indicator variable. Values should be integers from 1 to Q representing seasonal periods (e.g., quarters 1-4, months 1-12, or weeks 1-52).
post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).
Q (int, default 4) – Number of seasonal periods per cycle. Common values: - 4: Quarterly data (default) - 12: Monthly data - 52: Weekly data
- Return type:
- Returns:
yhat_all (ndarray) – Fitted values for all periods. Returns NaN array if estimation fails.
ydot (ndarray) – Seasonally-adjusted detrended outcomes for all periods. Returns NaN array if estimation fails due to numerical issues.
Notes
This transformation combines trend removal and seasonal adjustment, accounting for both unit-specific growth patterns and seasonal cycles. Time centering reduces the condition number of the design matrix without affecting predicted values.
The minimum required pre-treatment observations is Q + 2 to ensure at least one residual degree of freedom for OLS estimation (intercept + slope + Q-1 seasonal dummies = Q+1 parameters).
See also
demeanq_unitSeasonal adjustment without trend removal.
detrend_unitLinear trend removal without seasonal adjustment.
Internal Functions
- lwdid.transformations._demean_transform(data, y, ivar, post)[source]
Apply unit-specific demeaning transformation to all units.
Computes \(\dot{Y}_{it} = Y_{it} - \bar{Y}_{i,pre}\) for all observations, where \(\bar{Y}_{i,pre}\) is the pre-treatment mean:
\[\bar{Y}_{i,pre} = T_{0i}^{-1} \sum_{t: \text{post}=0} Y_{it}\]This transformation removes unit-specific level heterogeneity that may be correlated with treatment assignment.
- Parameters:
- Returns:
Input data with
ydotcolumn containing demeaned outcomes.- Return type:
pd.DataFrame
- Raises:
InsufficientPrePeriodsError – If any unit has no pre-treatment observations.
See also
_detrend_transformRemoves unit-specific linear trends.
- lwdid.transformations._detrend_transform(data, y, ivar, tindex, post)[source]
Apply unit-specific linear detrending transformation to all units.
For each unit i, estimates a linear trend from pre-treatment data and computes out-of-sample residuals:
\[\dot{Y}_{it} = Y_{it} - \hat{\alpha}_i - \hat{\beta}_i t\]This transformation removes unit-specific linear trends that may violate the parallel trends assumption when trends differ across treatment groups.
- Parameters:
data (pd.DataFrame) – Panel data containing outcome, time index, and post indicator columns.
y (str) – Column name of outcome variable.
ivar (str) – Column name of unit identifier.
tindex (str) – Column name of integer-valued time index.
post (str) – Column name of binary post-treatment indicator (0=pre, 1=post).
- Returns:
Input data with
ydotcolumn containing detrended outcomes.- Return type:
pd.DataFrame
- Raises:
InsufficientPrePeriodsError – If any unit has fewer than 2 pre-treatment observations.
See also
detrend_unitDetrending implementation for a single unit.
_demean_transformRemoves unit-specific means instead of trends.
Example Usage
The transformation is automatically applied by the main lwdid() function:
from lwdid import lwdid
# Standard demeaning (parallel trends in levels)
results_dm = lwdid(data, y='y', d='d', ivar='i', tvar='t',
post='post', rolling='demean')
# Detrending (allows heterogeneous linear trends)
results_dt = lwdid(data, y='y', d='d', ivar='i', tvar='t',
post='post', rolling='detrend')
# Quarterly demeaning with seasonality
results_dmq = lwdid(data, y='y', d='d', ivar='i',
tvar=['year', 'quarter'], # Composite time variable
post='post', rolling='demeanq')
# Quarterly detrending with seasonality
results_dtq = lwdid(data, y='y', d='d', ivar='i',
tvar=['year', 'quarter'], # Composite time variable
post='post', rolling='detrendq')
Transformation Selection Guide
demean: Pre-periods required >= 1. No seasonal support. Assumes parallel levels (common trends).
detrend: Pre-periods required >= 2. No seasonal support. Allows heterogeneous linear trends.
demeanq: Pre-periods required >= Q+1. Seasonal support (Q=4/12/52). Assumes parallel levels (common trends).
detrendq: Pre-periods required >= Q+2. Seasonal support (Q=4/12/52). Allows heterogeneous linear trends.
Staggered Adoption Transformations
For staggered adoption designs, transformations are applied separately for each treatment cohort \(g\), using cohort-specific pre-treatment periods. This approach follows Lee and Wooldridge (2025).
Cohort-Specific Demeaning
For cohort \(g\) (units first treated in period \(g\)) and calendar time \(r \geq g\), the transformed outcome is computed as:
where the cohort-specific pre-treatment mean is:
The key difference from common timing demeaning is that the pre-treatment average uses only periods \(\{1, 2, \ldots, g-1\}\) specific to each cohort, rather than a fixed set of pre-treatment periods for all units.
Requirements: For each cohort \(g\), at least \(g - 1 \geq 1\) pre-treatment periods must exist. Early cohorts (e.g., \(g = 2\)) may have limited pre-treatment data.
Cohort-Specific Detrending
For cohort \(g\) and calendar time \(r \geq g\), the detrended outcome is:
where \((\hat{A}_{ig}, \hat{B}_{ig})\) are estimated from the unit-specific regression using only pre-treatment periods for cohort \(g\):
This removes both unit-specific levels and unit-specific linear trends, allowing for heterogeneous trend paths across units while preserving the treatment variation for estimation.
Requirements: For each cohort \(g\), at least \(g - 1 \geq 2\) pre-treatment periods must exist to estimate both intercept and slope.
Staggered Transformation Usage
The staggered transformations are applied automatically when the gvar
parameter is specified in lwdid():
from lwdid import lwdid
# Staggered demeaning
results = lwdid(
data,
y='outcome',
ivar='unit',
tvar='year',
gvar='first_treat_year', # Cohort indicator
rolling='demean'
)
# Staggered detrending
results = lwdid(
data,
y='outcome',
ivar='unit',
tvar='year',
gvar='first_treat_year',
rolling='detrend'
)
Staggered Transformation Selection
In staggered adoption designs, only the demean and detrend
transformations are supported through the main lwdid() interface.
Seasonal transformations (demeanq, detrendq) are restricted to
common timing mode; passing rolling='demeanq' or rolling='detrendq'
with gvar raises a ValueError.
demean: Staggered support via
lwdid()= Yes. Pre-periods required: \(g - 1 \geq 1\) per cohort.detrend: Staggered support via
lwdid()= Yes. Pre-periods required: \(g - 1 \geq 2\) per cohort.demeanq: Staggered support via
lwdid()= No (common timing only).detrendq: Staggered support via
lwdid()= No (common timing only).
Note
The staggered module contains low-level implementations of
transform_staggered_demeanq and transform_staggered_detrendq
for advanced users, but these are not exposed through the main
lwdid() function.
See Also
lwdid.lwdid() : Main estimation function.
Methodological Notes : Theoretical foundations of transformations.
User Guide : Practical guidance on transformation selection.