Quick Start
This guide demonstrates basic usage of the lwdid package for difference-in-differences estimation with rolling transformations. The package supports three scenarios:
Common timing (small sample): Exact t-based inference under CLM assumptions
Common timing (large sample): Asymptotic inference with robust standard errors
Staggered adoption: Units treated at different times
Basic Example
Simplest Usage
import pandas as pd
from lwdid import lwdid
# Load data
data = pd.read_csv('smoking.csv')
# Run estimation
results = lwdid(
data,
y='lcigsale', # outcome variable
d='d', # treatment indicator (0 = control, 1 = treated)
ivar='state', # unit identifier
tvar='year', # time variable
post='post', # post-treatment indicator
rolling='demean', # transformation method
)
# View results
print(results.summary())
Key Parameters
Required Parameters
data: pandas DataFrame in long format (panel data)y: Outcome variable column name (string)d: Treatment indicator column name (0=control, non-zero=treated)ivar: Unit identifier column nametvar: Time variable column name (string for annual data, list for quarterly data)post: Post-treatment indicator column name (1=post-treatment, 0=pre-treatment)rolling: Transformation method, options:'demean': Standard DiD (unit fixed effects) — requires \(T_0 \geq 1\)'detrend': DiD with unit-specific linear trends — requires \(T_0 \geq 2\)'demeanq': Quarterly data with seasonal effects — requires \(T_0 \geq 1\)'detrendq': Quarterly data with trends and seasonal effects — requires \(T_0 \geq 2\)
Note
Pre-treatment period requirements:
demean/demeanq: Requires at least 1 pre-treatment period (\(T_0 \geq 1\))detrend/detrendq: Requires at least 2 pre-treatment periods (\(T_0 \geq 2\))
The demean method subtracts the unit-specific pre-treatment mean, as
described in Lee and Wooldridge (2026). The detrend method estimates
unit-specific linear trends using pre-treatment data via regression
\(Y_{it} = A_i + B_i t + \varepsilon_{it}\), which requires at least 2
observations to identify both intercept and slope (Lee and Wooldridge 2026).
Common Options
Small-Sample Exact Inference
For settings with small numbers of treated or control units, exact t-based inference is available under classical linear model assumptions:
results = lwdid(
data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
vce=None # Default: exact inference under CLM assumptions
)
Large-Sample Inference
For large cross-sectional samples, heteroskedasticity-robust standard errors provide valid asymptotic inference:
results = lwdid(
data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
vce='hc3' # HC3 robust standard errors for large samples
)
Doubly Robust Estimation
The IPWRA estimator combines regression adjustment with propensity score weighting, providing consistent estimates when either the outcome model or propensity score model is correctly specified:
results = lwdid(
data, 'outcome', 'd', 'unit', 'year', 'post', 'demean',
estimator='ipwra',
controls=['x1', 'x2'],
vce='hc3'
)
Note
IPWRA is particularly recommended for large-sample settings (N >= 50) where functional form assumptions are uncertain. See User Guide for detailed estimator selection guidelines.
Randomization Inference
results = lwdid(
data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
ri=True, # Enable randomization inference
rireps=1000, # Number of permutations
seed=42 # Random seed
)
print(f"RI p-value: {results.ri_pvalue:.4f}")
Adding Control Variables
# Note: Controls must be time-invariant (constant within each unit)
# For time-varying variables, create time-invariant versions first
# Example: Create pre-treatment mean for time-varying controls
data_prep = data.copy()
for var in ['retprice']:
pre_mean = data[data['post']==0].groupby('state')[var].mean()
data_prep[f'{var}_pre'] = data_prep['state'].map(pre_mean)
results = lwdid(
data_prep, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
controls=['retprice_pre'] # Time-invariant controls
)
Quarterly Data
results = lwdid(
data_q, 'outcome', 'd', 'unit',
tvar=['year', 'quarter'], # Composite time variable
post='post',
rolling='detrendq' # Quarterly detrending
)
Results Object
lwdid() returns a LWDIDResults object containing:
Core Attributes
results.att # ATT estimate
results.se_att # Standard error
results.t_stat # t-statistic
results.pvalue # p-value
results.ci_lower # 95% confidence interval lower bound
results.ci_upper # 95% confidence interval upper bound
results.df_inference # Degrees of freedom used for t-based inference
Sample Information
results.nobs # Number of observations in the regression sample
results.n_treated # Number of treated units in the regression sample
results.n_control # Number of control units in the regression sample
Period-Specific Effects
results.att_by_period # DataFrame: treatment effects by period
Methods
results.summary() # Print formatted results
results.plot() # Visualize time series
results.to_excel('output.xlsx') # Export to Excel
results.to_latex('table.tex') # Export to LaTeX
Data Format Requirements
Panel Data Structure
Data must be in long format (one row per observation):
ivar tvar y d post
1 2000 10.5 1 0
1 2001 10.8 1 0
1 2002 11.2 1 1 <- Unit 1 is treated (d=1), receives treatment in 2002 (post=1)
2 2000 9.3 0 0
2 2001 9.5 0 0
2 2002 9.7 0 0 <- Unit 2 is control (d=0), never treated
Note
Important: The d column must be a time-invariant treatment group indicator (\(D_i\)):
d = 1indicates the unit belongs to the treatment group (constant across all periods)d = 0indicates the unit belongs to the control group (constant across all periods)dmust NOT vary over time. Do not pass a time-varying treatment indicator \(W_{it} = D_i \times post_t\) as thedparameter
Key Requirements
Panel structure:
Data must be in long format: one row per unit-period observation
Each (unit, period) combination must be unique (no duplicate rows)
After constructing the internal time index (
tindex) fromtvar, the set of time indices must form a continuous sequence with no gapsPanels may be balanced or unbalanced across units; some units may have fewer periods, but each unit must satisfy the pre-treatment requirements for the chosen
rollingmethod (see above)
Common treatment timing: All treated units must start treatment in the same period (
postis a pure function of time)Treatment persistence: Once units enter the post-treatment regime (
postswitches from 0 to 1), they remain in it for all subsequent periods (no policy reversals)
Staggered Adoption
When units are treated at different times, use the gvar parameter:
import pandas as pd
from lwdid import lwdid
# Load staggered adoption data
data = pd.read_csv('castle.csv')
# Estimate with staggered design
# Create gvar (first treatment year, NaN -> 0 for never-treated)
data['gvar'] = data['effyear'].fillna(0).astype(int)
results = lwdid(
data,
y='lhomicide', # outcome variable
ivar='sid', # unit identifier (state ID)
tvar='year', # time variable
gvar='gvar', # first treatment period (0 = never treated)
rolling='demean', # transformation method
aggregate='overall', # aggregate to overall effect
control_group='not_yet_treated', # control group strategy
)
# View results
print(results.summary())
# Event study visualization
fig, ax = results.plot_event_study()
Staggered Results
# Overall weighted effect
results.att_overall # ATT estimate
results.se_overall # Standard error
# Cohort-specific effects
results.att_by_cohort # DataFrame with cohort-level effects
# Cohort-time specific effects
results.att_by_cohort_time # DataFrame with (g,r)-specific effects
Key Staggered Parameters
gvar: Column indicating first treatment period (0, inf, or NaN for never-treated)aggregate:'none','cohort', or'overall'control_group:'not_yet_treated'or'never_treated'estimator:'ra','ipw','ipwra', or'psm'
Next Steps
See User Guide for detailed usage including staggered designs
Browse Examples for complete examples
Read API Reference for API details
Read Staggered DiD Module (staggered) for staggered module details