Clustering Diagnostics Module (clustering_diagnostics) ======================================================= Clustering diagnostics and recommendations for difference-in-differences. This module provides tools for analyzing clustering structure in panel data and recommending appropriate clustering levels for standard error estimation. When treatment varies at a level higher than the observation unit, standard errors should be clustered at the policy variation level. Overview -------- Proper clustering is essential for valid inference in DiD settings. This module helps: - **Analyze hierarchical relationships**: Between potential clustering variables - **Detect treatment variation level**: Identify where treatment assignment varies - **Recommend clustering variables**: With sufficient cluster counts - **Check consistency**: Between clustering choice and treatment variation For reliable cluster-robust inference, a minimum of 20-30 clusters is generally recommended. When clusters are fewer, wild cluster bootstrap methods provide more accurate inference. Enums ----- .. autoclass:: lwdid.clustering_diagnostics.ClusteringLevel :members: :undoc-members: :no-index: .. autoclass:: lwdid.clustering_diagnostics.ClusteringWarningLevel :members: :undoc-members: :no-index: Data Classes ------------ .. autoclass:: lwdid.clustering_diagnostics.ClusterVarStats :members: :no-index: .. autoclass:: lwdid.clustering_diagnostics.ClusteringDiagnostics :members: :no-index: .. autoclass:: lwdid.clustering_diagnostics.ClusteringRecommendation :members: :no-index: .. autoclass:: lwdid.clustering_diagnostics.ClusteringConsistencyResult :members: :no-index: Main Functions -------------- .. autofunction:: lwdid.clustering_diagnostics.diagnose_clustering .. autofunction:: lwdid.clustering_diagnostics.recommend_clustering_level .. autofunction:: lwdid.clustering_diagnostics.check_clustering_consistency Example Usage ------------- Diagnosing Clustering Structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from lwdid import diagnose_clustering # Analyze potential clustering variables diagnostics = diagnose_clustering( data=panel_data, ivar='county', potential_cluster_vars=['state', 'region'], treatment_var='treated' ) # Review cluster counts for var_stats in diagnostics.cluster_var_stats: print(f"{var_stats.var_name}: {var_stats.n_clusters} clusters") # Check warnings for warning in diagnostics.warnings: print(f"[{warning.level}] {warning.message}") Getting Clustering Recommendations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from lwdid import recommend_clustering_level # Get recommendation with minimum cluster count recommendation = recommend_clustering_level( data=panel_data, ivar='county', potential_cluster_vars=['state', 'region'], treatment_var='treated', min_clusters=20 ) print(f"Recommended: {recommendation.recommended_var}") print(f"Reason: {recommendation.reason}") # Use in estimation results = lwdid( data, y='outcome', d='treated', ivar='county', tvar='year', post='post', rolling='demean', vce='cluster', cluster_var=recommendation.recommended_var ) Checking Consistency ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from lwdid import check_clustering_consistency # Verify clustering choice is appropriate consistency = check_clustering_consistency( data=panel_data, ivar='county', cluster_var='state', treatment_var='treated' ) print(f"Consistent: {consistency.is_consistent}") if not consistency.is_consistent: print(f"Issue: {consistency.message}") Wild Cluster Bootstrap ~~~~~~~~~~~~~~~~~~~~~~ When cluster counts are small (< 20), use wild cluster bootstrap: .. code-block:: python from lwdid import wild_cluster_bootstrap # Run wild cluster bootstrap bootstrap_result = wild_cluster_bootstrap( data=transformed_data, y_transformed='y_dot', d='treated', cluster_var='state', n_bootstrap=999, weight_type='rademacher' ) print(f"Bootstrap SE: {bootstrap_result.se_bootstrap:.4f}") print(f"Bootstrap p-value: {bootstrap_result.pvalue:.4f}") Guidelines ---------- **Minimum Cluster Counts:** - **20-30 clusters**: Generally sufficient for cluster-robust standard errors - **10-20 clusters**: Use wild cluster bootstrap for improved inference - **< 10 clusters**: Wild cluster bootstrap essential; consider randomization inference **Clustering Level Selection:** 1. Cluster at the level where treatment varies 2. If treatment varies at multiple levels, cluster at the highest level 3. Never cluster below the unit level See Also -------- - :doc:`inference` - Wild cluster bootstrap implementation - :doc:`../methodological_notes` - Theoretical foundations of clustering - :func:`lwdid.lwdid` - Main estimation with ``vce='cluster'``