Examples

This page provides complete, runnable examples demonstrating various features of the lwdid package.

Basic DiD Estimation

California Smoking Restriction

This classic example estimates the effect of California’s Proposition 99 (1988 tobacco tax increase) on cigarette sales.

Data structure:

  • 39 states (1 treated: California, 38 controls)

  • 31 years (1970-2000)

  • Treatment starts in 1989 (post = 1 for years \(\geq\) 1989)

Code:

import pandas as pd
from lwdid import lwdid

# Load data
data = pd.read_csv('smoking.csv')

# Basic estimation with demean
results = lwdid(
    data,
    y='lcigsale',      # Log per-capita cigarette sales
    d='d',             # Treatment indicator (1 for CA, 0 for others)
    ivar='state',      # State identifier
    tvar='year',       # Year
    post='post',       # Post-1988 indicator
    rolling='demean'   # Standard DiD transformation
)

# View results
print(results.summary())
results.plot()

# Export
results.to_excel('california_smoking_results.xlsx')

Robustness Checks

Check sensitivity to different specifications:

# 1. Detrend instead of demean (control for state-specific trends)
results_detrend = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend'
)

# 2. Robust standard errors
results_robust = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean',
    vce='hc3'
)

# 3. Randomization inference
results_ri = lwdid(
    data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean',
    ri=True, rireps=2000, seed=42
)

# Compare results
print(f"Demean:    ATT = {results.att:.4f}, p = {results.pvalue:.4f}")
print(f"Detrend:   ATT = {results_detrend.att:.4f}, p = {results_detrend.pvalue:.4f}")
print(f"HC3:       ATT = {results_robust.att:.4f}, p = {results_robust.pvalue:.4f}")
print(f"RI:        ATT = {results_ri.att:.4f}, RI p = {results_ri.ri_pvalue:.4f}")

Quarterly Data

Retail Sales with Seasonal Patterns

Estimating the effect of a policy change on quarterly retail sales.

Data structure:

  • 50 stores (10 treated, 40 controls)

  • 20 quarters (5 years)

  • Treatment starts in Q1 2023

Code:

import pandas as pd
from lwdid import lwdid

# Load quarterly data
data_q = pd.read_csv('retail_sales_quarterly.csv')

# Data has columns: store, year, quarter, sales, treated, post

# Use detrendq for quarterly data with trends and seasonality
results = lwdid(
    data_q,
    y='sales',
    d='treated',
    ivar='store',
    tvar=['year', 'quarter'],  # Composite time variable
    post='post',
    rolling='detrendq',  # Detrend + quarterly fixed effects
    vce='hc3'
)

print(results.summary())

# Examine period-specific effects
print("\nQuarterly treatment effects:")
print(results.att_by_period)

Note: For quarterly data, always use demeanq or detrendq to account for seasonal patterns.

Monthly Data

Monthly Sales with Seasonal Adjustment (Q=12)

Estimating treatment effects on monthly data with 12-month seasonal patterns.

Data structure:

  • Panel data with a single time index and a month indicator (1-12)

  • Pre-treatment sample must have at least 13 observations per unit for demeanq (14 for detrendq)

Code:

from lwdid import lwdid

# Estimate with monthly seasonal adjustment (Q=12)
results = lwdid(
    data_m,
    y='sales',
    d='treated',
    ivar='store',
    tvar='t',                 # Single time index
    post='post',
    rolling='demeanq',        # Seasonal demeaning
    Q=12,                     # 12 seasons per year
    season_var='month',       # Month indicator (1-12)
    vce='hc3'
)

print(results.summary())

Note: For monthly data, use Q=12 and provide a season_var column containing month indicators (1-12).

Weekly Data

Weekly Sales with Seasonal Adjustment (Q=52)

Estimating treatment effects on weekly data with 52-week seasonal patterns.

Data structure:

  • Panel data with a single time index and a week-of-year indicator (1-52)

  • Pre-treatment sample must have at least 53 observations per unit for demeanq (54 for detrendq) to estimate all 51 seasonal dummy coefficients

Code:

from lwdid import lwdid

# Estimate with weekly seasonal adjustment (Q=52)
results = lwdid(
    data_w,
    y='sales',
    d='treated',
    ivar='store',
    tvar='t',                 # Single time index
    post='post',
    rolling='demeanq',        # Seasonal demeaning
    Q=52,                     # 52 seasons per year
    season_var='week',        # Week indicator (1-52)
    vce='hc3'
)

print(results.summary())

Note: For weekly data, use Q=52 and provide a season_var column containing week-of-year indicators (1-52).

Control Variables

Including Baseline Characteristics

Control for time-invariant unit characteristics:

import pandas as pd
from lwdid import lwdid

# Load data with baseline controls
data = pd.read_csv('policy_evaluation.csv')

# Estimate with controls
results = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post_policy',
    rolling='detrend',
    controls=['baseline_income', 'baseline_population', 'urban'],
    vce='hc3'
)

print(results.summary())

Important: Controls must be time-invariant (constant within each unit). The package will raise an error if controls vary over time.

Cluster-Robust Inference

Multi-Level Data Structure

When units are nested within clusters (e.g., schools within districts):

import pandas as pd
from lwdid import lwdid

# Load data: students nested in schools nested in districts
data = pd.read_csv('education_intervention.csv')

# Cluster by district
results = lwdid(
    data,
    y='test_score',
    d='treated_school',
    ivar='student',
    tvar='year',
    post='post_intervention',
    rolling='demean',
    vce='cluster',
    cluster_var='district'  # Cluster at district level
)

print(results.summary())
print(f"Number of clusters: {results.n_clusters}")
print(f"Cluster-robust df: {results.df_inference}")

Note: Cluster-robust standard errors use \(df = G - 1\), where \(G\) is the number of clusters. Need at least \(G \geq 10\) for reliable inference.

Randomization Inference

Non-Parametric Testing

Use randomization inference when normality is questionable:

import pandas as pd
from lwdid import lwdid

data = pd.read_csv('small_sample.csv')

# Run with both t-based and RI inference
results = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post',
    rolling='demean',
    vce='hc3',
    ri=True,
    rireps=5000,  # More permutations for precise p-value
    ri_method='permutation',  # Recommended
    seed=12345
)

# Compare inference methods
print(f"t-based p-value: {results.pvalue:.4f}")
print(f"RI p-value: {results.ri_pvalue:.4f}")

# lwdid stores summary RI statistics (ri_pvalue, ri_method, rireps, ri_valid, ri_failed).
# If you need the full permutation distribution, you can construct it manually by
# repeatedly calling the randomization_inference() helper on the firstpost sample.

Comparing RI Methods

# Permutation (recommended)
results_perm = lwdid(
    data, 'y', 'd', 'unit', 'year', 'post', 'demean',
    ri=True, ri_method='permutation', rireps=1000, seed=42
)

# Bootstrap (for comparison)
results_boot = lwdid(
    data, 'y', 'd', 'unit', 'year', 'post', 'demean',
    ri=True, ri_method='bootstrap', rireps=1000, seed=42
)

print(f"Permutation RI p-value: {results_perm.ri_pvalue:.4f}")
print(f"Bootstrap RI p-value: {results_boot.ri_pvalue:.4f}")

Diagnostic Checks

Sensitivity Analysis

Test sensitivity to different specifications:

import pandas as pd
from lwdid import lwdid

data = pd.read_csv('data.csv')

# Define specifications to test
specs = [
    {'rolling': 'demean', 'vce': None, 'label': 'Demean, OLS'},
    {'rolling': 'demean', 'vce': 'hc3', 'label': 'Demean, HC3'},
    {'rolling': 'detrend', 'vce': None, 'label': 'Detrend, OLS'},
    {'rolling': 'detrend', 'vce': 'hc3', 'label': 'Detrend, HC3'},
]

# Run all specifications
results_list = []
for spec in specs:
    res = lwdid(
        data, 'outcome', 'treated', 'unit', 'year', 'post',
        rolling=spec['rolling'], vce=spec['vce']
    )
    results_list.append({
        'Specification': spec['label'],
        'ATT': res.att,
        'SE': res.se_att,
        'p-value': res.pvalue,
        'CI Lower': res.ci_lower,
        'CI Upper': res.ci_upper
    })

# Create comparison table
comparison = pd.DataFrame(results_list)
print(comparison.to_string(index=False))

# Visualize
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
y_pos = range(len(comparison))
ax.errorbar(comparison['ATT'], y_pos,
            xerr=1.96*comparison['SE'], fmt='o', capsize=5)
ax.set_yticks(y_pos)
ax.set_yticklabels(comparison['Specification'])
ax.axvline(0, color='black', linestyle='--', alpha=0.5)
ax.set_xlabel('ATT Estimate')
ax.set_title('Sensitivity Analysis: ATT Across Specifications')
ax.grid(alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

Export and Reporting

Creating Publication Tables

import pandas as pd
from lwdid import lwdid

data = pd.read_csv('data.csv')

# Run main specification
results = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post', 'detrend',
    vce='hc3', ri=True, rireps=2000, seed=42
)

# Export to Excel (multiple sheets)
results.to_excel('publication_results.xlsx')

# Export to LaTeX
results.to_latex('table_main_results.tex')

# Export period-specific effects to CSV
results.to_csv('period_effects.csv')

# Create custom summary table
summary_data = {
    'Estimate': [results.att],
    'Std. Error': [results.se_att],
    't-statistic': [results.t_stat],
    'p-value': [results.pvalue],
    'RI p-value': [results.ri_pvalue],
    '95% CI Lower': [results.ci_lower],
    '95% CI Upper': [results.ci_upper],
    'N': [results.nobs],
    'Treated Units': [results.n_treated],
    'Control Units': [results.n_control]
}
summary_df = pd.DataFrame(summary_data)
summary_df.to_latex('summary_table.tex', index=False, float_format='%.4f')

Staggered DiD: Castle Law

Castle Doctrine Analysis

This example demonstrates staggered difference-in-differences estimation using the Castle Doctrine data from Cheng and Hoekstra (2013). States adopted “Stand Your Ground” laws at different times between 2005-2009.

Data structure:

  • 50 states observed from 2000-2010

  • 21 treated states (adopted Castle Doctrine)

  • 29 never-treated states (control group)

  • 5 treatment cohorts: 2005 (1 state), 2006 (13 states), 2007 (4 states), 2008 (2 states), 2009 (1 state)

Overall Effect:

import pandas as pd
from lwdid import lwdid

# Load data
data = pd.read_csv('castle.csv')

# Create gvar (first treatment year, 0 = never treated)
data['gvar'] = data['effyear'].fillna(0).astype(int)

# Overall weighted effect
results = lwdid(
    data=data,
    y='lhomicide',           # Log homicide rate
    ivar='sid',              # State ID (must be integer)
    tvar='year',             # Year
    gvar='gvar',             # First treatment year
    rolling='demean',        # Demeaning transformation
    control_group='never_treated',
    aggregate='overall',     # Weighted average effect
    vce='hc3'
)

print(f"Overall ATT: {results.att_overall:.4f}")
print(f"SE: {results.se_overall:.4f}")
print(f"95% CI: [{results.ci_overall_lower:.4f}, {results.ci_overall_upper:.4f}]")

Cohort-Specific Effects:

# Cohort-specific effects
results_cohort = lwdid(
    data=data,
    y='lhomicide',
    ivar='sid',
    tvar='year',
    gvar='gvar',
    aggregate='cohort',      # Aggregate within cohorts
    vce='hc3'
)

print(results_cohort.att_by_cohort)
# Shows ATT for each adoption cohort (2005, 2006, 2007, 2008, 2009)

Event Study:

# All (g, r) specific effects
results_gr = lwdid(
    data=data,
    y='lhomicide',
    ivar='sid',
    tvar='year',
    gvar='gvar',
    aggregate='none',        # No aggregation
    vce='hc3'
)

# Plot event study
results_gr.plot_event_study(
    title='Castle Doctrine Effect',
    ylabel='Effect on Log Homicide Rate'
)

See the Jupyter notebook examples/castle_law.ipynb for the complete analysis.

Complete Workflow Example

End-to-End Analysis

A complete analysis from data loading to reporting:

import pandas as pd
import matplotlib.pyplot as plt
from lwdid import lwdid

# 1. Load and prepare data
data = pd.read_csv('policy_data.csv')

# 2. Descriptive statistics
print("Sample composition:")
print(data.groupby(['treated', 'post']).size().unstack())

# 3. Main estimation
results_main = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post_policy',
    rolling='detrend',
    controls=['baseline_x1', 'baseline_x2'],
    vce='hc3',
    ri=True,
    rireps=2000,
    seed=42
)

# 4. Print results
print("\n" + "="*80)
print("MAIN RESULTS")
print("="*80)
results_main.summary()

# 5. Robustness checks
print("\n" + "="*80)
print("ROBUSTNESS CHECKS")
print("="*80)

# Without controls
results_no_controls = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post_policy',
    'detrend', vce='hc3'
)

# Demean instead of detrend
results_demean = lwdid(
    data, 'outcome', 'treated', 'unit', 'year', 'post_policy',
    'demean', controls=['baseline_x1', 'baseline_x2'], vce='hc3'
)

# Compare
robustness = pd.DataFrame({
    'Specification': ['Main', 'No Controls', 'Demean'],
    'ATT': [results_main.att, results_no_controls.att, results_demean.att],
    'SE': [results_main.se_att, results_no_controls.se_att, results_demean.se_att],
    'p-value': [results_main.pvalue, results_no_controls.pvalue, results_demean.pvalue]
})
print(robustness.to_string(index=False))

# 6. Parallel trends test
print("\n" + "="*80)
print("PARALLEL TRENDS TEST")
print("="*80)

period_effects = results_main.att_by_period
pre_treatment = period_effects[period_effects['tindex'] < results_main.tpost1]
sig_pre = pre_treatment[pre_treatment['pval'] < 0.05]

if len(sig_pre) == 0:
    print("✓ No significant pre-treatment effects detected")
else:
    print(f"✗ {len(sig_pre)} significant pre-treatment effects detected:")
    print(sig_pre[['period', 'beta', 'pval']])

# 7. Visualizations
# Plot treatment effects over time
results_main.plot()

# Save a separate figure if needed, for example:
# results_main.plot(graph_options={'savefig': 'analysis_results.png'})

# 8. Export results
results_main.to_excel('main_results.xlsx')
results_main.to_latex('main_results.tex')
robustness.to_csv('robustness_checks.csv', index=False)

print("\n" + "="*80)
print("Analysis complete. Results saved to:")
print("  - main_results.xlsx")
print("  - main_results.tex")
print("  - robustness_checks.csv")
print("  - analysis_results.png")
print("="*80)

See Also