Examples

This page provides complete, runnable examples demonstrating various features of the lwdid package.

Basic DiD Estimation 

California Smoking Restriction 

This classic example estimates the effect of California’s Proposition 99 (1988 tobacco tax increase) on cigarette sales.

Data structure:

39 states (1 treated: California, 38 controls)
31 years (1970-2000)
Treatment starts in 1989 (post = 1 for years \(\geq\) 1989)

Code:

import pandas as pd
from lwdid import lwdid

# Load data
data = pd.read_csv('smoking.csv')

# Basic estimation with demean
results = lwdid(
    data,
    y='lcigsale',      # Log per-capita cigarette sales
    d='d',             # Treatment indicator (1 for CA, 0 for others)
    ivar='state',      # State identifier
    tvar='year',       # Year
    post='post',       # Post-1988 indicator
    rolling='demean'   # Standard DiD transformation
)

# View results
print(results.summary())
results.plot()

# Export
results.to_excel('california_smoking_results.xlsx')

Robustness Checks 

Check sensitivity to different specifications:

# 1. Detrend instead of demean (control for state-specific trends)
results_detrend = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend'
)

# 2. Robust standard errors
results_robust = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean',
    vce='hc3'
)

# 3. Randomization inference
results_ri = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean',
    ri=True, rireps=2000, seed=42
)

# Compare results
print(f"Demean:    ATT = {results.att:.4f}, p = {results.pvalue:.4f}")
print(f"Detrend:   ATT = {results_detrend.att:.4f}, p = {results_detrend.pvalue:.4f}")
print(f"HC3:       ATT = {results_robust.att:.4f}, p = {results_robust.pvalue:.4f}")
print(f"RI:        ATT = {results_ri.att:.4f}, RI p = {results_ri.ri_pvalue:.4f}")

Quarterly Data 

Retail Sales with Seasonal Patterns 

Estimating the effect of a policy change on quarterly retail sales.

Data structure:

50 stores (10 treated, 40 controls)
20 quarters (5 years)
Treatment starts in Q1 2023

Code:

import pandas as pd
from lwdid import lwdid

# Load quarterly data
data_q = pd.read_csv('retail_sales_quarterly.csv')

# Data has columns: store, year, quarter, sales, treated, post

# Use detrendq for quarterly data with trends and seasonality
results = lwdid(
    data_q,
    y='sales',
    d='treated',
    ivar='store',
    tvar=['year', 'quarter'],  # Composite time variable
    post='post',
    rolling='detrendq',  # Detrend + quarterly fixed effects
    vce='hc3'
)

print(results.summary())

# Examine period-specific effects
print("\nQuarterly treatment effects:")
print(results.att_by_period)

Note: For quarterly data, always use demeanq or detrendq to account for seasonal patterns.

Monthly Data 

Monthly Sales with Seasonal Adjustment (Q=12)

Estimating treatment effects on monthly data with 12-month seasonal patterns.

Data structure:

Panel data with a single time index and a month indicator (1-12)
Pre-treatment sample must have at least 13 observations per unit for demeanq (14 for detrendq)

Code:

from lwdid import lwdid

# Estimate with monthly seasonal adjustment (Q=12)
results = lwdid(
    data_m,
    y='sales',
    d='treated',
    ivar='store',
    tvar='t',                 # Single time index
    post='post',
    rolling='demeanq',        # Seasonal demeaning
    Q=12,                     # 12 seasons per year
    season_var='month',       # Month indicator (1-12)
    vce='hc3'
)

print(results.summary())

Note: For monthly data, use Q=12 and provide a season_var column containing month indicators (1-12).

Weekly Data 

Weekly Sales with Seasonal Adjustment (Q=52)

Estimating treatment effects on weekly data with 52-week seasonal patterns.

Data structure:

Panel data with a single time index and a week-of-year indicator (1-52)
Pre-treatment sample must have at least 53 observations per unit for demeanq (54 for detrendq) to estimate all 51 seasonal dummy coefficients

Code:

from lwdid import lwdid

# Estimate with weekly seasonal adjustment (Q=52)
results = lwdid(
    data_w,
    y='sales',
    d='treated',
    ivar='store',
    tvar='t',                 # Single time index
    post='post',
    rolling='demeanq',        # Seasonal demeaning
    Q=52,                     # 52 seasons per year
    season_var='week',        # Week indicator (1-52)
    vce='hc3'
)

print(results.summary())

Note: For weekly data, use Q=52 and provide a season_var column containing week-of-year indicators (1-52).

Control Variables 

Including Baseline Characteristics 

Control for time-invariant unit characteristics:

import pandas as pd
from lwdid import lwdid

# Load data with baseline controls
data = pd.read_csv('policy_evaluation.csv')

# Estimate with controls
results = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post_policy',
    rolling='detrend',
    controls=['baseline_income', 'baseline_population', 'urban'],
    vce='hc3'
)

print(results.summary())

Important: Controls must be time-invariant (constant within each unit). The package will raise an error if controls vary over time.

Cluster-Robust Inference 

Multi-Level Data Structure 

When units are nested within clusters (e.g., schools within districts):

import pandas as pd
from lwdid import lwdid

# Load data: students nested in schools nested in districts
data = pd.read_csv('education_intervention.csv')

# Cluster by district
results = lwdid(
    data,
    y='test_score',
    d='treated_school',
    ivar='student',
    tvar='year',
    post='post_intervention',
    rolling='demean',
    vce='cluster',
    cluster_var='district'  # Cluster at district level
)

print(results.summary())
print(f"Number of clusters: {results.n_clusters}")
print(f"Cluster-robust df: {results.df_inference}")

Note: Cluster-robust standard errors use \(df = G - 1\), where \(G\) is the number of clusters. Need at least \(G \geq 10\) for reliable inference.

Randomization Inference 

Non-Parametric Testing 

Use randomization inference when normality is questionable:

import pandas as pd
from lwdid import lwdid

data = pd.read_csv('small_sample.csv')

# Run with both t-based and RI inference
results = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post',
    rolling='demean',
    vce='hc3',
    ri=True,
    rireps=5000,  # More permutations for precise p-value
    ri_method='permutation',  # Recommended
    seed=12345
)

# Compare inference methods
print(f"t-based p-value: {results.pvalue:.4f}")
print(f"RI p-value: {results.ri_pvalue:.4f}")

# lwdid stores summary RI statistics (ri_pvalue, ri_method, rireps, ri_valid, ri_failed).
# If you need the full permutation distribution, you can construct it manually by
# repeatedly calling the randomization_inference() helper on the firstpost sample.

Comparing RI Methods 

# Permutation (recommended)
results_perm = lwdid(
    data, 'y', 'd', 'unit', 'year', 'post', 'demean',
    ri=True, ri_method='permutation', rireps=1000, seed=42
)

# Bootstrap (for comparison)
results_boot = lwdid(
    data, 'y', 'd', 'unit', 'year', 'post', 'demean',
    ri=True, ri_method='bootstrap', rireps=1000, seed=42
)

print(f"Permutation RI p-value: {results_perm.ri_pvalue:.4f}")
print(f"Bootstrap RI p-value: {results_boot.ri_pvalue:.4f}")

Diagnostic Checks 

Testing Parallel Trends 

Examine pre-treatment period-specific effects:

import pandas as pd
from lwdid import lwdid
import matplotlib.pyplot as plt

data = pd.read_csv('data.csv')

results = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post', 'demean'
)

# Extract period-specific effects
period_effects = results.att_by_period

# Identify pre-treatment periods
pre_treatment = period_effects[period_effects['tindex'] < results.tpost1]
post_treatment = period_effects[period_effects['tindex'] >= results.tpost1]

# Check for significant pre-treatment effects
sig_pre = pre_treatment[pre_treatment['pval'] < 0.05]
if len(sig_pre) > 0:
    print("WARNING: Significant pre-treatment effects detected!")
    print(sig_pre[['period', 'beta', 'pval']])
else:
    print("No significant pre-treatment effects (parallel trends supported)")

# Visualize
plt.figure(figsize=(12, 6))
plt.errorbar(pre_treatment['tindex'], pre_treatment['beta'],
             yerr=1.96*pre_treatment['se'], fmt='o-', label='Pre-treatment',
             capsize=5)
plt.errorbar(post_treatment['tindex'], post_treatment['beta'],
             yerr=1.96*post_treatment['se'], fmt='s-', label='Post-treatment',
             capsize=5, color='red')
plt.axhline(0, color='black', linestyle='--', alpha=0.5)
plt.axvline(results.tpost1 - 0.5, color='gray', linestyle='--', alpha=0.5,
            label='Treatment start')
plt.xlabel('Time Index')
plt.ylabel('Treatment Effect')
plt.title('Period-Specific Treatment Effects (Parallel Trends Test)')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

Sensitivity Analysis 

Test sensitivity to different specifications:

import pandas as pd
from lwdid import lwdid

data = pd.read_csv('data.csv')

# Define specifications to test
specs = [
    {'rolling': 'demean', 'vce': None, 'label': 'Demean, OLS'},
    {'rolling': 'demean', 'vce': 'hc3', 'label': 'Demean, HC3'},
    {'rolling': 'detrend', 'vce': None, 'label': 'Detrend, OLS'},
    {'rolling': 'detrend', 'vce': 'hc3', 'label': 'Detrend, HC3'},
]

# Run all specifications
results_list = []
for spec in specs:
    res = lwdid(
        data, 'outcome', 'treated', 'unit', 'year', 'post',
        rolling=spec['rolling'], vce=spec['vce']
    )
    results_list.append({
        'Specification': spec['label'],
        'ATT': res.att,
        'SE': res.se_att,
        'p-value': res.pvalue,
        'CI Lower': res.ci_lower,
        'CI Upper': res.ci_upper
    })

# Create comparison table
comparison = pd.DataFrame(results_list)
print(comparison.to_string(index=False))

# Visualize
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
y_pos = range(len(comparison))
ax.errorbar(comparison['ATT'], y_pos,
            xerr=1.96*comparison['SE'], fmt='o', capsize=5)
ax.set_yticks(y_pos)
ax.set_yticklabels(comparison['Specification'])
ax.axvline(0, color='black', linestyle='--', alpha=0.5)
ax.set_xlabel('ATT Estimate')
ax.set_title('Sensitivity Analysis: ATT Across Specifications')
ax.grid(alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

Export and Reporting 

Creating Publication Tables 

import pandas as pd
from lwdid import lwdid

data = pd.read_csv('data.csv')

# Run main specification
results = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post', 'detrend',
    vce='hc3', ri=True, rireps=2000, seed=42
)

# Export to Excel (multiple sheets)
results.to_excel('publication_results.xlsx')

# Export to LaTeX
results.to_latex('table_main_results.tex')

# Export period-specific effects to CSV
results.to_csv('period_effects.csv')

# Create custom summary table
summary_data = {
    'Estimate': [results.att],
    'Std. Error': [results.se_att],
    't-statistic': [results.t_stat],
    'p-value': [results.pvalue],
    'RI p-value': [results.ri_pvalue],
    '95% CI Lower': [results.ci_lower],
    '95% CI Upper': [results.ci_upper],
    'N': [results.nobs],
    'Treated Units': [results.n_treated],
    'Control Units': [results.n_control]
}
summary_df = pd.DataFrame(summary_data)
summary_df.to_latex('summary_table.tex', index=False, float_format='%.4f')

This example demonstrates staggered difference-in-differences estimation using the Castle Doctrine data from Cheng and Hoekstra (2013). States adopted “Stand Your Ground” laws at different times between 2005-2009.

Data structure:

50 states observed from 2000-2010
21 treated states (adopted Castle Doctrine)
29 never-treated states (control group)
5 treatment cohorts: 2005 (1 state), 2006 (13 states), 2007 (4 states), 2008 (2 states), 2009 (1 state)

Overall Effect:

import pandas as pd
from lwdid import lwdid

# Load data
data = pd.read_csv('castle.csv')

# Create gvar (first treatment year, 0 = never treated)
data['gvar'] = data['effyear'].fillna(0).astype(int)

# Overall weighted effect
results = lwdid(
    data=data,
    y='lhomicide',           # Log homicide rate
    ivar='sid',              # State ID (must be integer)
    tvar='year',             # Year
    gvar='gvar',             # First treatment year
    rolling='demean',        # Demeaning transformation
    control_group='never_treated',
    aggregate='overall',     # Weighted average effect
    vce='hc3'
)

print(f"Overall ATT: {results.att_overall:.4f}")
print(f"SE: {results.se_overall:.4f}")
print(f"95% CI: [{results.ci_overall_lower:.4f}, {results.ci_overall_upper:.4f}]")

Cohort-Specific Effects:

# Cohort-specific effects
results_cohort = lwdid(
    data=data,
    y='lhomicide',
    ivar='sid',
    tvar='year',
    gvar='gvar',
    aggregate='cohort',      # Aggregate within cohorts
    vce='hc3'
)

print(results_cohort.att_by_cohort)
# Shows ATT for each adoption cohort (2005, 2006, 2007, 2008, 2009)

Event Study:

# All (g, r) specific effects
results_gr = lwdid(
    data=data,
    y='lhomicide',
    ivar='sid',
    tvar='year',
    gvar='gvar',
    aggregate='none',        # No aggregation
    vce='hc3'
)

# Plot event study
results_gr.plot_event_study(
    title='Castle Doctrine Effect',
    ylabel='Effect on Log Homicide Rate'
)

See the Jupyter notebook examples/castle_law.ipynb for the complete analysis.

Complete Workflow Example 

End-to-End Analysis 

A complete analysis from data loading to reporting:

import pandas as pd
import matplotlib.pyplot as plt
from lwdid import lwdid

# 1. Load and prepare data
data = pd.read_csv('policy_data.csv')

# 2. Descriptive statistics
print("Sample composition:")
print(data.groupby(['treated', 'post']).size().unstack())

# 3. Main estimation
results_main = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post_policy',
    rolling='detrend',
    controls=['baseline_x1', 'baseline_x2'],
    vce='hc3',
    ri=True,
    rireps=2000,
    seed=42
)

# 4. Print results
print("\n" + "="*80)
print("MAIN RESULTS")
print("="*80)
results_main.summary()

# 5. Robustness checks
print("\n" + "="*80)
print("ROBUSTNESS CHECKS")
print("="*80)

# Without controls
results_no_controls = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post_policy',
    'detrend', vce='hc3'
)

# Demean instead of detrend
results_demean = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post_policy',
    'demean', controls=['baseline_x1', 'baseline_x2'], vce='hc3'
)

# Compare
robustness = pd.DataFrame({
    'Specification': ['Main', 'No Controls', 'Demean'],
    'ATT': [results_main.att, results_no_controls.att, results_demean.att],
    'SE': [results_main.se_att, results_no_controls.se_att, results_demean.se_att],
    'p-value': [results_main.pvalue, results_no_controls.pvalue, results_demean.pvalue]
})
print(robustness.to_string(index=False))

# 6. Parallel trends test
print("\n" + "="*80)
print("PARALLEL TRENDS TEST")
print("="*80)

period_effects = results_main.att_by_period
pre_treatment = period_effects[period_effects['tindex'] < results_main.tpost1]
sig_pre = pre_treatment[pre_treatment['pval'] < 0.05]

if len(sig_pre) == 0:
    print("✓ No significant pre-treatment effects detected")
else:
    print(f"✗ {len(sig_pre)} significant pre-treatment effects detected:")
    print(sig_pre[['period', 'beta', 'pval']])

# 7. Visualizations
# Plot treatment effects over time
results_main.plot()

# Save a separate figure if needed, for example:
# results_main.plot(graph_options={'savefig': 'analysis_results.png'})

# 8. Export results
results_main.to_excel('main_results.xlsx')
results_main.to_latex('main_results.tex')
robustness.to_csv('robustness_checks.csv', index=False)

print("\n" + "="*80)
print("Analysis complete. Results saved to:")
print("  - main_results.xlsx")
print("  - main_results.tex")
print("  - robustness_checks.csv")
print("  - analysis_results.png")
print("="*80)

Examples

Basic DiD Estimation 

California Smoking Restriction 

Robustness Checks 

Quarterly Data 

Retail Sales with Seasonal Patterns 

Monthly Data 

Monthly Sales with Seasonal Adjustment (Q=12)

Weekly Data 

Weekly Sales with Seasonal Adjustment (Q=52)

Control Variables 

Including Baseline Characteristics 

Cluster-Robust Inference 

Multi-Level Data Structure 

Randomization Inference 

Non-Parametric Testing 

Comparing RI Methods 

Diagnostic Checks 

Testing Parallel Trends 

Sensitivity Analysis 

Export and Reporting 

Creating Publication Tables 

Staggered DiD: Castle Law 

Castle Doctrine Analysis 

Complete Workflow Example 

End-to-End Analysis 

See Also 