Quick Start

This guide demonstrates basic usage of the lwdid package for difference-in-differences estimation with rolling transformations. The package supports three scenarios:

Common timing (small sample): Exact t-based inference under CLM assumptions
Common timing (large sample): Asymptotic inference with robust standard errors
Staggered adoption: Units treated at different times

Basic Example

Simplest Usage

import pandas as pd
from lwdid import lwdid

# Load data
data = pd.read_csv('smoking.csv')

# Run estimation
results = lwdid(
    data,
    y='lcigsale',      # outcome variable
    d='d',             # treatment indicator (0 = control, 1 = treated)
    ivar='state',      # unit identifier
    tvar='year',       # time variable
    post='post',       # post-treatment indicator
    rolling='demean',  # transformation method
)

# View results
print(results.summary())

Key Parameters

Required Parameters

data: pandas DataFrame in long format (panel data)
y: Outcome variable column name (string)
d: Treatment indicator column name (0=control, non-zero=treated)
ivar: Unit identifier column name
tvar: Time variable column name (string for annual data, list for quarterly data)
post: Post-treatment indicator column name (1=post-treatment, 0=pre-treatment)
rolling: Transformation method, options:
- 'demean': Standard DiD (unit fixed effects) — requires \(T_0 \geq 1\)
- 'detrend': DiD with unit-specific linear trends — requires \(T_0 \geq 2\)
- 'demeanq': Quarterly data with seasonal effects — requires \(T_0 \geq 1\)
- 'detrendq': Quarterly data with trends and seasonal effects — requires \(T_0 \geq 2\)

Note

Pre-treatment period requirements:

demean/demeanq: Requires at least 1 pre-treatment period (\(T_0 \geq 1\))
detrend/detrendq: Requires at least 2 pre-treatment periods (\(T_0 \geq 2\))

The demean method subtracts the unit-specific pre-treatment mean, as described in Lee and Wooldridge (2026). The detrend method estimates unit-specific linear trends using pre-treatment data via regression \(Y_{it} = A_i + B_i t + \varepsilon_{it}\), which requires at least 2 observations to identify both intercept and slope (Lee and Wooldridge 2026).

Common Options

Small-Sample Exact Inference

For settings with small numbers of treated or control units, exact t-based inference is available under classical linear model assumptions:

results = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
    vce=None  # Default: exact inference under CLM assumptions
)

Large-Sample Inference

For large cross-sectional samples, heteroskedasticity-robust standard errors provide valid asymptotic inference:

results = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
    vce='hc3'  # HC3 robust standard errors for large samples
)

Doubly Robust Estimation

The IPWRA estimator combines regression adjustment with propensity score weighting, providing consistent estimates when either the outcome model or propensity score model is correctly specified:

results = lwdid(
    data, 'outcome', 'd', 'unit', 'year', 'post', 'demean',
    estimator='ipwra',
    controls=['x1', 'x2'],
    vce='hc3'
)

Note

IPWRA is particularly recommended for large-sample settings (N >= 50) where functional form assumptions are uncertain. See User Guide for detailed estimator selection guidelines.

Randomization Inference

results = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
    ri=True,        # Enable randomization inference
    rireps=1000,    # Number of permutations
    seed=42         # Random seed
)
print(f"RI p-value: {results.ri_pvalue:.4f}")

Adding Control Variables

# Note: Controls must be time-invariant (constant within each unit)
# For time-varying variables, create time-invariant versions first

# Example: Create pre-treatment mean for time-varying controls
data_prep = data.copy()
for var in ['retprice']:
    pre_mean = data[data['post']==0].groupby('state')[var].mean()
    data_prep[f'{var}_pre'] = data_prep['state'].map(pre_mean)

results = lwdid(
    data_prep, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend',
    controls=['retprice_pre']  # Time-invariant controls
)

Quarterly Data

results = lwdid(
    data_q, 'outcome', 'd', 'unit',
    tvar=['year', 'quarter'],  # Composite time variable
    post='post',
    rolling='detrendq'         # Quarterly detrending
)

Results Object

lwdid() returns a LWDIDResults object containing:

Core Attributes

results.att           # ATT estimate
results.se_att        # Standard error
results.t_stat        # t-statistic
results.pvalue        # p-value
results.ci_lower      # 95% confidence interval lower bound
results.ci_upper      # 95% confidence interval upper bound
results.df_inference  # Degrees of freedom used for t-based inference

Sample Information

results.nobs          # Number of observations in the regression sample
results.n_treated     # Number of treated units in the regression sample
results.n_control     # Number of control units in the regression sample

Period-Specific Effects

results.att_by_period  # DataFrame: treatment effects by period

Methods

results.summary()              # Print formatted results
results.plot()                 # Visualize time series
results.to_excel('output.xlsx')  # Export to Excel
results.to_latex('table.tex')    # Export to LaTeX

Data Format Requirements

Panel Data Structure

Data must be in long format (one row per observation):

ivar    tvar    y       d    post
     2000    10.5    1    0
     2001    10.8    1    0
     2002    11.2    1    1    <- Unit 1 is treated (d=1), receives treatment in 2002 (post=1)
     2000    9.3     0    0
     2001    9.5     0    0
     2002    9.7     0    0    <- Unit 2 is control (d=0), never treated

Note

Important: The d column must be a time-invariant treatment group indicator (\(D_i\)):

d = 1 indicates the unit belongs to the treatment group (constant across all periods)
d = 0 indicates the unit belongs to the control group (constant across all periods)
d must NOT vary over time. Do not pass a time-varying treatment indicator \(W_{it} = D_i \times post_t\) as the d parameter

Key Requirements

Panel structure:
- Data must be in long format: one row per unit-period observation
- Each (unit, period) combination must be unique (no duplicate rows)
- After constructing the internal time index (tindex) from tvar, the set of time indices must form a continuous sequence with no gaps
- Panels may be balanced or unbalanced across units; some units may have fewer periods, but each unit must satisfy the pre-treatment requirements for the chosen rolling method (see above)
Common treatment timing: All treated units must start treatment in the same period (post is a pure function of time)
Treatment persistence: Once units enter the post-treatment regime (post switches from 0 to 1), they remain in it for all subsequent periods (no policy reversals)

Staggered Adoption

When units are treated at different times, use the gvar parameter:

import pandas as pd
from lwdid import lwdid

# Load staggered adoption data
data = pd.read_csv('castle.csv')

# Estimate with staggered design
# Create gvar (first treatment year, NaN -> 0 for never-treated)
data['gvar'] = data['effyear'].fillna(0).astype(int)

results = lwdid(
    data,
    y='lhomicide',            # outcome variable
    ivar='sid',               # unit identifier (state ID)
    tvar='year',              # time variable
    gvar='gvar',              # first treatment period (0 = never treated)
    rolling='demean',         # transformation method
    aggregate='overall',      # aggregate to overall effect
    control_group='not_yet_treated',  # control group strategy
)

# View results
print(results.summary())

# Event study visualization
fig, ax = results.plot_event_study()

Staggered Results

# Overall weighted effect
results.att_overall       # ATT estimate
results.se_overall        # Standard error

# Cohort-specific effects
results.att_by_cohort     # DataFrame with cohort-level effects

# Cohort-time specific effects
results.att_by_cohort_time  # DataFrame with (g,r)-specific effects

Key Staggered Parameters

gvar: Column indicating first treatment period (0, inf, or NaN for never-treated)
aggregate: 'none', 'cohort', or 'overall'
control_group: 'not_yet_treated' or 'never_treated'
estimator: 'ra', 'ipw', 'ipwra', or 'psm'

Next Steps

See User Guide for detailed usage including staggered designs
Browse Examples for complete examples
Read API Reference for API details
Read Staggered DiD Module (staggered) for staggered module details