Examples
This page provides complete, runnable examples demonstrating various features
of the lwdid package.
Basic DiD Estimation
California Smoking Restriction
This classic example estimates the effect of California’s Proposition 99 (1988 tobacco tax increase) on cigarette sales.
Data structure:
39 states (1 treated: California, 38 controls)
31 years (1970-2000)
Treatment starts in 1989 (post = 1 for years \(\geq\) 1989)
Code:
import pandas as pd
from lwdid import lwdid
# Load data
data = pd.read_csv('smoking.csv')
# Basic estimation with demean
results = lwdid(
data,
y='lcigsale', # Log per-capita cigarette sales
d='d', # Treatment indicator (1 for CA, 0 for others)
ivar='state', # State identifier
tvar='year', # Year
post='post', # Post-1988 indicator
rolling='demean' # Standard DiD transformation
)
# View results
print(results.summary())
results.plot()
# Export
results.to_excel('california_smoking_results.xlsx')
Robustness Checks
Check sensitivity to different specifications:
# 1. Detrend instead of demean (control for state-specific trends)
results_detrend = lwdid(
data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend'
)
# 2. Robust standard errors
results_robust = lwdid(
data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean',
vce='hc3'
)
# 3. Randomization inference
results_ri = lwdid(
data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean',
ri=True, rireps=2000, seed=42
)
# Compare results
print(f"Demean: ATT = {results.att:.4f}, p = {results.pvalue:.4f}")
print(f"Detrend: ATT = {results_detrend.att:.4f}, p = {results_detrend.pvalue:.4f}")
print(f"HC3: ATT = {results_robust.att:.4f}, p = {results_robust.pvalue:.4f}")
print(f"RI: ATT = {results_ri.att:.4f}, RI p = {results_ri.ri_pvalue:.4f}")
Quarterly Data
Retail Sales with Seasonal Patterns
Estimating the effect of a policy change on quarterly retail sales.
Data structure:
50 stores (10 treated, 40 controls)
20 quarters (5 years)
Treatment starts in Q1 2023
Code:
import pandas as pd
from lwdid import lwdid
# Load quarterly data
data_q = pd.read_csv('retail_sales_quarterly.csv')
# Data has columns: store, year, quarter, sales, treated, post
# Use detrendq for quarterly data with trends and seasonality
results = lwdid(
data_q,
y='sales',
d='treated',
ivar='store',
tvar=['year', 'quarter'], # Composite time variable
post='post',
rolling='detrendq', # Detrend + quarterly fixed effects
vce='hc3'
)
print(results.summary())
# Examine period-specific effects
print("\nQuarterly treatment effects:")
print(results.att_by_period)
Note: For quarterly data, always use demeanq or detrendq to account
for seasonal patterns.
Monthly Data
Monthly Sales with Seasonal Adjustment (Q=12)
Estimating treatment effects on monthly data with 12-month seasonal patterns.
Data structure:
Panel data with a single time index and a month indicator (1-12)
Pre-treatment sample must have at least 13 observations per unit for
demeanq(14 fordetrendq)
Code:
from lwdid import lwdid
# Estimate with monthly seasonal adjustment (Q=12)
results = lwdid(
data_m,
y='sales',
d='treated',
ivar='store',
tvar='t', # Single time index
post='post',
rolling='demeanq', # Seasonal demeaning
Q=12, # 12 seasons per year
season_var='month', # Month indicator (1-12)
vce='hc3'
)
print(results.summary())
Note: For monthly data, use Q=12 and provide a season_var column
containing month indicators (1-12).
Weekly Data
Weekly Sales with Seasonal Adjustment (Q=52)
Estimating treatment effects on weekly data with 52-week seasonal patterns.
Data structure:
Panel data with a single time index and a week-of-year indicator (1-52)
Pre-treatment sample must have at least 53 observations per unit for
demeanq(54 fordetrendq) to estimate all 51 seasonal dummy coefficients
Code:
from lwdid import lwdid
# Estimate with weekly seasonal adjustment (Q=52)
results = lwdid(
data_w,
y='sales',
d='treated',
ivar='store',
tvar='t', # Single time index
post='post',
rolling='demeanq', # Seasonal demeaning
Q=52, # 52 seasons per year
season_var='week', # Week indicator (1-52)
vce='hc3'
)
print(results.summary())
Note: For weekly data, use Q=52 and provide a season_var column
containing week-of-year indicators (1-52).
Control Variables
Including Baseline Characteristics
Control for time-invariant unit characteristics:
import pandas as pd
from lwdid import lwdid
# Load data with baseline controls
data = pd.read_csv('policy_evaluation.csv')
# Estimate with controls
results = lwdid(
data,
y='outcome',
d='treated',
ivar='unit',
tvar='year',
post='post_policy',
rolling='detrend',
controls=['baseline_income', 'baseline_population', 'urban'],
vce='hc3'
)
print(results.summary())
Important: Controls must be time-invariant (constant within each unit). The package will raise an error if controls vary over time.
Cluster-Robust Inference
Multi-Level Data Structure
When units are nested within clusters (e.g., schools within districts):
import pandas as pd
from lwdid import lwdid
# Load data: students nested in schools nested in districts
data = pd.read_csv('education_intervention.csv')
# Cluster by district
results = lwdid(
data,
y='test_score',
d='treated_school',
ivar='student',
tvar='year',
post='post_intervention',
rolling='demean',
vce='cluster',
cluster_var='district' # Cluster at district level
)
print(results.summary())
print(f"Number of clusters: {results.n_clusters}")
print(f"Cluster-robust df: {results.df_inference}")
Note: Cluster-robust standard errors use \(df = G - 1\), where \(G\) is the number of clusters. Need at least \(G \geq 10\) for reliable inference.
Randomization Inference
Non-Parametric Testing
Use randomization inference when normality is questionable:
import pandas as pd
from lwdid import lwdid
data = pd.read_csv('small_sample.csv')
# Run with both t-based and RI inference
results = lwdid(
data,
y='outcome',
d='treated',
ivar='unit',
tvar='year',
post='post',
rolling='demean',
vce='hc3',
ri=True,
rireps=5000, # More permutations for precise p-value
ri_method='permutation', # Recommended
seed=12345
)
# Compare inference methods
print(f"t-based p-value: {results.pvalue:.4f}")
print(f"RI p-value: {results.ri_pvalue:.4f}")
# lwdid stores summary RI statistics (ri_pvalue, ri_method, rireps, ri_valid, ri_failed).
# If you need the full permutation distribution, you can construct it manually by
# repeatedly calling the randomization_inference() helper on the firstpost sample.
Comparing RI Methods
# Permutation (recommended)
results_perm = lwdid(
data, 'y', 'd', 'unit', 'year', 'post', 'demean',
ri=True, ri_method='permutation', rireps=1000, seed=42
)
# Bootstrap (for comparison)
results_boot = lwdid(
data, 'y', 'd', 'unit', 'year', 'post', 'demean',
ri=True, ri_method='bootstrap', rireps=1000, seed=42
)
print(f"Permutation RI p-value: {results_perm.ri_pvalue:.4f}")
print(f"Bootstrap RI p-value: {results_boot.ri_pvalue:.4f}")
Diagnostic Checks
Testing Parallel Trends
Examine pre-treatment period-specific effects:
import pandas as pd
from lwdid import lwdid
import matplotlib.pyplot as plt
data = pd.read_csv('data.csv')
results = lwdid(
data, 'outcome', 'treated', 'unit', 'year', 'post', 'demean'
)
# Extract period-specific effects
period_effects = results.att_by_period
# Identify pre-treatment periods
pre_treatment = period_effects[period_effects['tindex'] < results.tpost1]
post_treatment = period_effects[period_effects['tindex'] >= results.tpost1]
# Check for significant pre-treatment effects
sig_pre = pre_treatment[pre_treatment['pval'] < 0.05]
if len(sig_pre) > 0:
print("WARNING: Significant pre-treatment effects detected!")
print(sig_pre[['period', 'beta', 'pval']])
else:
print("No significant pre-treatment effects (parallel trends supported)")
# Visualize
plt.figure(figsize=(12, 6))
plt.errorbar(pre_treatment['tindex'], pre_treatment['beta'],
yerr=1.96*pre_treatment['se'], fmt='o-', label='Pre-treatment',
capsize=5)
plt.errorbar(post_treatment['tindex'], post_treatment['beta'],
yerr=1.96*post_treatment['se'], fmt='s-', label='Post-treatment',
capsize=5, color='red')
plt.axhline(0, color='black', linestyle='--', alpha=0.5)
plt.axvline(results.tpost1 - 0.5, color='gray', linestyle='--', alpha=0.5,
label='Treatment start')
plt.xlabel('Time Index')
plt.ylabel('Treatment Effect')
plt.title('Period-Specific Treatment Effects (Parallel Trends Test)')
plt.legend()
plt.grid(alpha=0.3)
plt.show()
Sensitivity Analysis
Test sensitivity to different specifications:
import pandas as pd
from lwdid import lwdid
data = pd.read_csv('data.csv')
# Define specifications to test
specs = [
{'rolling': 'demean', 'vce': None, 'label': 'Demean, OLS'},
{'rolling': 'demean', 'vce': 'hc3', 'label': 'Demean, HC3'},
{'rolling': 'detrend', 'vce': None, 'label': 'Detrend, OLS'},
{'rolling': 'detrend', 'vce': 'hc3', 'label': 'Detrend, HC3'},
]
# Run all specifications
results_list = []
for spec in specs:
res = lwdid(
data, 'outcome', 'treated', 'unit', 'year', 'post',
rolling=spec['rolling'], vce=spec['vce']
)
results_list.append({
'Specification': spec['label'],
'ATT': res.att,
'SE': res.se_att,
'p-value': res.pvalue,
'CI Lower': res.ci_lower,
'CI Upper': res.ci_upper
})
# Create comparison table
comparison = pd.DataFrame(results_list)
print(comparison.to_string(index=False))
# Visualize
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
y_pos = range(len(comparison))
ax.errorbar(comparison['ATT'], y_pos,
xerr=1.96*comparison['SE'], fmt='o', capsize=5)
ax.set_yticks(y_pos)
ax.set_yticklabels(comparison['Specification'])
ax.axvline(0, color='black', linestyle='--', alpha=0.5)
ax.set_xlabel('ATT Estimate')
ax.set_title('Sensitivity Analysis: ATT Across Specifications')
ax.grid(alpha=0.3, axis='x')
plt.tight_layout()
plt.show()
Export and Reporting
Creating Publication Tables
import pandas as pd
from lwdid import lwdid
data = pd.read_csv('data.csv')
# Run main specification
results = lwdid(
data, 'outcome', 'treated', 'unit', 'year', 'post', 'detrend',
vce='hc3', ri=True, rireps=2000, seed=42
)
# Export to Excel (multiple sheets)
results.to_excel('publication_results.xlsx')
# Export to LaTeX
results.to_latex('table_main_results.tex')
# Export period-specific effects to CSV
results.to_csv('period_effects.csv')
# Create custom summary table
summary_data = {
'Estimate': [results.att],
'Std. Error': [results.se_att],
't-statistic': [results.t_stat],
'p-value': [results.pvalue],
'RI p-value': [results.ri_pvalue],
'95% CI Lower': [results.ci_lower],
'95% CI Upper': [results.ci_upper],
'N': [results.nobs],
'Treated Units': [results.n_treated],
'Control Units': [results.n_control]
}
summary_df = pd.DataFrame(summary_data)
summary_df.to_latex('summary_table.tex', index=False, float_format='%.4f')
Staggered DiD: Castle Law
Castle Doctrine Analysis
This example demonstrates staggered difference-in-differences estimation using the Castle Doctrine data from Cheng and Hoekstra (2013). States adopted “Stand Your Ground” laws at different times between 2005-2009.
Data structure:
50 states observed from 2000-2010
21 treated states (adopted Castle Doctrine)
29 never-treated states (control group)
5 treatment cohorts: 2005 (1 state), 2006 (13 states), 2007 (4 states), 2008 (2 states), 2009 (1 state)
Overall Effect:
import pandas as pd
from lwdid import lwdid
# Load data
data = pd.read_csv('castle.csv')
# Create gvar (first treatment year, 0 = never treated)
data['gvar'] = data['effyear'].fillna(0).astype(int)
# Overall weighted effect
results = lwdid(
data=data,
y='lhomicide', # Log homicide rate
ivar='sid', # State ID (must be integer)
tvar='year', # Year
gvar='gvar', # First treatment year
rolling='demean', # Demeaning transformation
control_group='never_treated',
aggregate='overall', # Weighted average effect
vce='hc3'
)
print(f"Overall ATT: {results.att_overall:.4f}")
print(f"SE: {results.se_overall:.4f}")
print(f"95% CI: [{results.ci_overall_lower:.4f}, {results.ci_overall_upper:.4f}]")
Cohort-Specific Effects:
# Cohort-specific effects
results_cohort = lwdid(
data=data,
y='lhomicide',
ivar='sid',
tvar='year',
gvar='gvar',
aggregate='cohort', # Aggregate within cohorts
vce='hc3'
)
print(results_cohort.att_by_cohort)
# Shows ATT for each adoption cohort (2005, 2006, 2007, 2008, 2009)
Event Study:
# All (g, r) specific effects
results_gr = lwdid(
data=data,
y='lhomicide',
ivar='sid',
tvar='year',
gvar='gvar',
aggregate='none', # No aggregation
vce='hc3'
)
# Plot event study
results_gr.plot_event_study(
title='Castle Doctrine Effect',
ylabel='Effect on Log Homicide Rate'
)
See the Jupyter notebook examples/castle_law.ipynb for the complete analysis.
Complete Workflow Example
End-to-End Analysis
A complete analysis from data loading to reporting:
import pandas as pd
import matplotlib.pyplot as plt
from lwdid import lwdid
# 1. Load and prepare data
data = pd.read_csv('policy_data.csv')
# 2. Descriptive statistics
print("Sample composition:")
print(data.groupby(['treated', 'post']).size().unstack())
# 3. Main estimation
results_main = lwdid(
data,
y='outcome',
d='treated',
ivar='unit',
tvar='year',
post='post_policy',
rolling='detrend',
controls=['baseline_x1', 'baseline_x2'],
vce='hc3',
ri=True,
rireps=2000,
seed=42
)
# 4. Print results
print("\n" + "="*80)
print("MAIN RESULTS")
print("="*80)
results_main.summary()
# 5. Robustness checks
print("\n" + "="*80)
print("ROBUSTNESS CHECKS")
print("="*80)
# Without controls
results_no_controls = lwdid(
data, 'outcome', 'treated', 'unit', 'year', 'post_policy',
'detrend', vce='hc3'
)
# Demean instead of detrend
results_demean = lwdid(
data, 'outcome', 'treated', 'unit', 'year', 'post_policy',
'demean', controls=['baseline_x1', 'baseline_x2'], vce='hc3'
)
# Compare
robustness = pd.DataFrame({
'Specification': ['Main', 'No Controls', 'Demean'],
'ATT': [results_main.att, results_no_controls.att, results_demean.att],
'SE': [results_main.se_att, results_no_controls.se_att, results_demean.se_att],
'p-value': [results_main.pvalue, results_no_controls.pvalue, results_demean.pvalue]
})
print(robustness.to_string(index=False))
# 6. Parallel trends test
print("\n" + "="*80)
print("PARALLEL TRENDS TEST")
print("="*80)
period_effects = results_main.att_by_period
pre_treatment = period_effects[period_effects['tindex'] < results_main.tpost1]
sig_pre = pre_treatment[pre_treatment['pval'] < 0.05]
if len(sig_pre) == 0:
print("✓ No significant pre-treatment effects detected")
else:
print(f"✗ {len(sig_pre)} significant pre-treatment effects detected:")
print(sig_pre[['period', 'beta', 'pval']])
# 7. Visualizations
# Plot treatment effects over time
results_main.plot()
# Save a separate figure if needed, for example:
# results_main.plot(graph_options={'savefig': 'analysis_results.png'})
# 8. Export results
results_main.to_excel('main_results.xlsx')
results_main.to_latex('main_results.tex')
robustness.to_csv('robustness_checks.csv', index=False)
print("\n" + "="*80)
print("Analysis complete. Results saved to:")
print(" - main_results.xlsx")
print(" - main_results.tex")
print(" - robustness_checks.csv")
print(" - analysis_results.png")
print("="*80)
See Also
User Guide - Comprehensive usage guide
Quick Start - Quick start tutorial
API Reference - Complete API reference
Methodological Notes - Theoretical background