Examples ======== This page provides complete, runnable examples demonstrating various features of the ``lwdid`` package. .. contents:: Table of Contents :local: :depth: 2 Basic DiD Estimation -------------------- California Smoking Restriction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This classic example estimates the effect of California's Proposition 99 (1988 tobacco tax increase) on cigarette sales. **Data structure:** - 39 states (1 treated: California, 38 controls) - 31 years (1970-2000) - Treatment starts in 1989 (post = 1 for years :math:`\geq` 1989) **Code:** .. code-block:: python import pandas as pd from lwdid import lwdid # Load data data = pd.read_csv('smoking.csv') # Basic estimation with demean results = lwdid( data, y='lcigsale', # Log per-capita cigarette sales d='d', # Treatment indicator (1 for CA, 0 for others) ivar='state', # State identifier tvar='year', # Year post='post', # Post-1988 indicator rolling='demean' # Standard DiD transformation ) # View results print(results.summary()) results.plot() # Export results.to_excel('california_smoking_results.xlsx') Robustness Checks ~~~~~~~~~~~~~~~~~ Check sensitivity to different specifications: .. code-block:: python # 1. Detrend instead of demean (control for state-specific trends) results_detrend = lwdid( data, 'lcigsale', 'd', 'state', 'year', 'post', 'detrend' ) # 2. Robust standard errors results_robust = lwdid( data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean', vce='hc3' ) # 3. Randomization inference results_ri = lwdid( data, 'lcigsale', 'd', 'state', 'year', 'post', 'demean', ri=True, rireps=2000, seed=42 ) # Compare results print(f"Demean: ATT = {results.att:.4f}, p = {results.pvalue:.4f}") print(f"Detrend: ATT = {results_detrend.att:.4f}, p = {results_detrend.pvalue:.4f}") print(f"HC3: ATT = {results_robust.att:.4f}, p = {results_robust.pvalue:.4f}") print(f"RI: ATT = {results_ri.att:.4f}, RI p = {results_ri.ri_pvalue:.4f}") Quarterly Data -------------- Retail Sales with Seasonal Patterns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Estimating the effect of a policy change on quarterly retail sales. **Data structure:** - 50 stores (10 treated, 40 controls) - 20 quarters (5 years) - Treatment starts in Q1 2023 **Code:** .. code-block:: python import pandas as pd from lwdid import lwdid # Load quarterly data data_q = pd.read_csv('retail_sales_quarterly.csv') # Data has columns: store, year, quarter, sales, treated, post # Use detrendq for quarterly data with trends and seasonality results = lwdid( data_q, y='sales', d='treated', ivar='store', tvar=['year', 'quarter'], # Composite time variable post='post', rolling='detrendq', # Detrend + quarterly fixed effects vce='hc3' ) print(results.summary()) # Examine period-specific effects print("\nQuarterly treatment effects:") print(results.att_by_period) **Note:** For quarterly data, always use ``demeanq`` or ``detrendq`` to account for seasonal patterns. Monthly Data ------------ Monthly Sales with Seasonal Adjustment (Q=12) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Estimating treatment effects on monthly data with 12-month seasonal patterns. **Data structure:** - Panel data with a single time index and a month indicator (1-12) - Pre-treatment sample must have at least 13 observations per unit for ``demeanq`` (14 for ``detrendq``) **Code:** .. code-block:: python from lwdid import lwdid # Estimate with monthly seasonal adjustment (Q=12) results = lwdid( data_m, y='sales', d='treated', ivar='store', tvar='t', # Single time index post='post', rolling='demeanq', # Seasonal demeaning Q=12, # 12 seasons per year season_var='month', # Month indicator (1-12) vce='hc3' ) print(results.summary()) **Note:** For monthly data, use ``Q=12`` and provide a ``season_var`` column containing month indicators (1-12). Weekly Data ----------- Weekly Sales with Seasonal Adjustment (Q=52) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Estimating treatment effects on weekly data with 52-week seasonal patterns. **Data structure:** - Panel data with a single time index and a week-of-year indicator (1-52) - Pre-treatment sample must have at least 53 observations per unit for ``demeanq`` (54 for ``detrendq``) to estimate all 51 seasonal dummy coefficients **Code:** .. code-block:: python from lwdid import lwdid # Estimate with weekly seasonal adjustment (Q=52) results = lwdid( data_w, y='sales', d='treated', ivar='store', tvar='t', # Single time index post='post', rolling='demeanq', # Seasonal demeaning Q=52, # 52 seasons per year season_var='week', # Week indicator (1-52) vce='hc3' ) print(results.summary()) **Note:** For weekly data, use ``Q=52`` and provide a ``season_var`` column containing week-of-year indicators (1-52). Control Variables ----------------- Including Baseline Characteristics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Control for time-invariant unit characteristics: .. code-block:: python import pandas as pd from lwdid import lwdid # Load data with baseline controls data = pd.read_csv('policy_evaluation.csv') # Estimate with controls results = lwdid( data, y='outcome', d='treated', ivar='unit', tvar='year', post='post_policy', rolling='detrend', controls=['baseline_income', 'baseline_population', 'urban'], vce='hc3' ) print(results.summary()) **Important:** Controls must be time-invariant (constant within each unit). The package will raise an error if controls vary over time. Cluster-Robust Inference ------------------------- Multi-Level Data Structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~ When units are nested within clusters (e.g., schools within districts): .. code-block:: python import pandas as pd from lwdid import lwdid # Load data: students nested in schools nested in districts data = pd.read_csv('education_intervention.csv') # Cluster by district results = lwdid( data, y='test_score', d='treated_school', ivar='student', tvar='year', post='post_intervention', rolling='demean', vce='cluster', cluster_var='district' # Cluster at district level ) print(results.summary()) print(f"Number of clusters: {results.n_clusters}") print(f"Cluster-robust df: {results.df_inference}") **Note:** Cluster-robust standard errors use :math:`df = G - 1`, where :math:`G` is the number of clusters. Need at least :math:`G \geq 10` for reliable inference. Randomization Inference ----------------------- Non-Parametric Testing ~~~~~~~~~~~~~~~~~~~~~~ Use randomization inference when normality is questionable: .. code-block:: python import pandas as pd from lwdid import lwdid data = pd.read_csv('small_sample.csv') # Run with both t-based and RI inference results = lwdid( data, y='outcome', d='treated', ivar='unit', tvar='year', post='post', rolling='demean', vce='hc3', ri=True, rireps=5000, # More permutations for precise p-value ri_method='permutation', # Recommended seed=12345 ) # Compare inference methods print(f"t-based p-value: {results.pvalue:.4f}") print(f"RI p-value: {results.ri_pvalue:.4f}") # lwdid stores summary RI statistics (ri_pvalue, ri_method, rireps, ri_valid, ri_failed). # If you need the full permutation distribution, you can construct it manually by # repeatedly calling the randomization_inference() helper on the firstpost sample. Comparing RI Methods ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Permutation (recommended) results_perm = lwdid( data, 'y', 'd', 'unit', 'year', 'post', 'demean', ri=True, ri_method='permutation', rireps=1000, seed=42 ) # Bootstrap (for comparison) results_boot = lwdid( data, 'y', 'd', 'unit', 'year', 'post', 'demean', ri=True, ri_method='bootstrap', rireps=1000, seed=42 ) print(f"Permutation RI p-value: {results_perm.ri_pvalue:.4f}") print(f"Bootstrap RI p-value: {results_boot.ri_pvalue:.4f}") Diagnostic Checks ----------------- Testing Parallel Trends ~~~~~~~~~~~~~~~~~~~~~~~~ Examine pre-treatment period-specific effects: .. code-block:: python import pandas as pd from lwdid import lwdid import matplotlib.pyplot as plt data = pd.read_csv('data.csv') results = lwdid( data, 'outcome', 'treated', 'unit', 'year', 'post', 'demean' ) # Extract period-specific effects period_effects = results.att_by_period # Identify pre-treatment periods pre_treatment = period_effects[period_effects['tindex'] < results.tpost1] post_treatment = period_effects[period_effects['tindex'] >= results.tpost1] # Check for significant pre-treatment effects sig_pre = pre_treatment[pre_treatment['pval'] < 0.05] if len(sig_pre) > 0: print("WARNING: Significant pre-treatment effects detected!") print(sig_pre[['period', 'beta', 'pval']]) else: print("No significant pre-treatment effects (parallel trends supported)") # Visualize plt.figure(figsize=(12, 6)) plt.errorbar(pre_treatment['tindex'], pre_treatment['beta'], yerr=1.96*pre_treatment['se'], fmt='o-', label='Pre-treatment', capsize=5) plt.errorbar(post_treatment['tindex'], post_treatment['beta'], yerr=1.96*post_treatment['se'], fmt='s-', label='Post-treatment', capsize=5, color='red') plt.axhline(0, color='black', linestyle='--', alpha=0.5) plt.axvline(results.tpost1 - 0.5, color='gray', linestyle='--', alpha=0.5, label='Treatment start') plt.xlabel('Time Index') plt.ylabel('Treatment Effect') plt.title('Period-Specific Treatment Effects (Parallel Trends Test)') plt.legend() plt.grid(alpha=0.3) plt.show() Sensitivity Analysis ~~~~~~~~~~~~~~~~~~~~ Test sensitivity to different specifications: .. code-block:: python import pandas as pd from lwdid import lwdid data = pd.read_csv('data.csv') # Define specifications to test specs = [ {'rolling': 'demean', 'vce': None, 'label': 'Demean, OLS'}, {'rolling': 'demean', 'vce': 'hc3', 'label': 'Demean, HC3'}, {'rolling': 'detrend', 'vce': None, 'label': 'Detrend, OLS'}, {'rolling': 'detrend', 'vce': 'hc3', 'label': 'Detrend, HC3'}, ] # Run all specifications results_list = [] for spec in specs: res = lwdid( data, 'outcome', 'treated', 'unit', 'year', 'post', rolling=spec['rolling'], vce=spec['vce'] ) results_list.append({ 'Specification': spec['label'], 'ATT': res.att, 'SE': res.se_att, 'p-value': res.pvalue, 'CI Lower': res.ci_lower, 'CI Upper': res.ci_upper }) # Create comparison table comparison = pd.DataFrame(results_list) print(comparison.to_string(index=False)) # Visualize import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(10, 6)) y_pos = range(len(comparison)) ax.errorbar(comparison['ATT'], y_pos, xerr=1.96*comparison['SE'], fmt='o', capsize=5) ax.set_yticks(y_pos) ax.set_yticklabels(comparison['Specification']) ax.axvline(0, color='black', linestyle='--', alpha=0.5) ax.set_xlabel('ATT Estimate') ax.set_title('Sensitivity Analysis: ATT Across Specifications') ax.grid(alpha=0.3, axis='x') plt.tight_layout() plt.show() Export and Reporting -------------------- Creating Publication Tables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import pandas as pd from lwdid import lwdid data = pd.read_csv('data.csv') # Run main specification results = lwdid( data, 'outcome', 'treated', 'unit', 'year', 'post', 'detrend', vce='hc3', ri=True, rireps=2000, seed=42 ) # Export to Excel (multiple sheets) results.to_excel('publication_results.xlsx') # Export to LaTeX results.to_latex('table_main_results.tex') # Export period-specific effects to CSV results.to_csv('period_effects.csv') # Create custom summary table summary_data = { 'Estimate': [results.att], 'Std. Error': [results.se_att], 't-statistic': [results.t_stat], 'p-value': [results.pvalue], 'RI p-value': [results.ri_pvalue], '95% CI Lower': [results.ci_lower], '95% CI Upper': [results.ci_upper], 'N': [results.nobs], 'Treated Units': [results.n_treated], 'Control Units': [results.n_control] } summary_df = pd.DataFrame(summary_data) summary_df.to_latex('summary_table.tex', index=False, float_format='%.4f') Staggered DiD: Castle Law ------------------------- Castle Doctrine Analysis ~~~~~~~~~~~~~~~~~~~~~~~~ This example demonstrates staggered difference-in-differences estimation using the Castle Doctrine data from Cheng and Hoekstra (2013). States adopted "Stand Your Ground" laws at different times between 2005-2009. **Data structure:** - 50 states observed from 2000-2010 - 21 treated states (adopted Castle Doctrine) - 29 never-treated states (control group) - 5 treatment cohorts: 2005 (1 state), 2006 (13 states), 2007 (4 states), 2008 (2 states), 2009 (1 state) **Overall Effect:** .. code-block:: python import pandas as pd from lwdid import lwdid # Load data data = pd.read_csv('castle.csv') # Create gvar (first treatment year, 0 = never treated) data['gvar'] = data['effyear'].fillna(0).astype(int) # Overall weighted effect results = lwdid( data=data, y='lhomicide', # Log homicide rate ivar='sid', # State ID (must be integer) tvar='year', # Year gvar='gvar', # First treatment year rolling='demean', # Demeaning transformation control_group='never_treated', aggregate='overall', # Weighted average effect vce='hc3' ) print(f"Overall ATT: {results.att_overall:.4f}") print(f"SE: {results.se_overall:.4f}") print(f"95% CI: [{results.ci_overall_lower:.4f}, {results.ci_overall_upper:.4f}]") **Cohort-Specific Effects:** .. code-block:: python # Cohort-specific effects results_cohort = lwdid( data=data, y='lhomicide', ivar='sid', tvar='year', gvar='gvar', aggregate='cohort', # Aggregate within cohorts vce='hc3' ) print(results_cohort.att_by_cohort) # Shows ATT for each adoption cohort (2005, 2006, 2007, 2008, 2009) **Event Study:** .. code-block:: python # All (g, r) specific effects results_gr = lwdid( data=data, y='lhomicide', ivar='sid', tvar='year', gvar='gvar', aggregate='none', # No aggregation vce='hc3' ) # Plot event study results_gr.plot_event_study( title='Castle Doctrine Effect', ylabel='Effect on Log Homicide Rate' ) See the Jupyter notebook ``examples/castle_law.ipynb`` for the complete analysis. Complete Workflow Example -------------------------- End-to-End Analysis ~~~~~~~~~~~~~~~~~~~ A complete analysis from data loading to reporting: .. code-block:: python import pandas as pd import matplotlib.pyplot as plt from lwdid import lwdid # 1. Load and prepare data data = pd.read_csv('policy_data.csv') # 2. Descriptive statistics print("Sample composition:") print(data.groupby(['treated', 'post']).size().unstack()) # 3. Main estimation results_main = lwdid( data, y='outcome', d='treated', ivar='unit', tvar='year', post='post_policy', rolling='detrend', controls=['baseline_x1', 'baseline_x2'], vce='hc3', ri=True, rireps=2000, seed=42 ) # 4. Print results print("\n" + "="*80) print("MAIN RESULTS") print("="*80) results_main.summary() # 5. Robustness checks print("\n" + "="*80) print("ROBUSTNESS CHECKS") print("="*80) # Without controls results_no_controls = lwdid( data, 'outcome', 'treated', 'unit', 'year', 'post_policy', 'detrend', vce='hc3' ) # Demean instead of detrend results_demean = lwdid( data, 'outcome', 'treated', 'unit', 'year', 'post_policy', 'demean', controls=['baseline_x1', 'baseline_x2'], vce='hc3' ) # Compare robustness = pd.DataFrame({ 'Specification': ['Main', 'No Controls', 'Demean'], 'ATT': [results_main.att, results_no_controls.att, results_demean.att], 'SE': [results_main.se_att, results_no_controls.se_att, results_demean.se_att], 'p-value': [results_main.pvalue, results_no_controls.pvalue, results_demean.pvalue] }) print(robustness.to_string(index=False)) # 6. Parallel trends test print("\n" + "="*80) print("PARALLEL TRENDS TEST") print("="*80) period_effects = results_main.att_by_period pre_treatment = period_effects[period_effects['tindex'] < results_main.tpost1] sig_pre = pre_treatment[pre_treatment['pval'] < 0.05] if len(sig_pre) == 0: print("✓ No significant pre-treatment effects detected") else: print(f"✗ {len(sig_pre)} significant pre-treatment effects detected:") print(sig_pre[['period', 'beta', 'pval']]) # 7. Visualizations # Plot treatment effects over time results_main.plot() # Save a separate figure if needed, for example: # results_main.plot(graph_options={'savefig': 'analysis_results.png'}) # 8. Export results results_main.to_excel('main_results.xlsx') results_main.to_latex('main_results.tex') robustness.to_csv('robustness_checks.csv', index=False) print("\n" + "="*80) print("Analysis complete. Results saved to:") print(" - main_results.xlsx") print(" - main_results.tex") print(" - robustness_checks.csv") print(" - analysis_results.png") print("="*80) See Also -------- - :doc:`../user_guide` - Comprehensive usage guide - :doc:`../quickstart` - Quick start tutorial - :doc:`../api/index` - Complete API reference - :doc:`../methodological_notes` - Theoretical background