Randomization Inference Module

Hypothesis testing via randomization inference for small and large samples.

This module implements randomization inference (RI) for testing the sharp null hypothesis of no treatment effect in difference-in-differences settings. The approach provides valid p-values without relying on asymptotic distributional assumptions.

Applicability: RI is applicable to both small-sample and large-sample scenarios. While it is particularly valuable for small samples where t-based inference may be unreliable, it also serves as a robust alternative for large samples when normality assumptions are questionable. See Lee and Wooldridge (2026) for discussion of RI in the small-sample context.

Randomization inference for small-sample hypothesis testing.

This module implements randomization inference (RI) for testing the sharp null hypothesis of no treatment effect in difference-in-differences settings. The approach provides valid p-values without relying on asymptotic distributional assumptions, making it particularly suitable for small sample sizes where t-based inference may be unreliable.

The implementation supports two resampling methods: permutation (classical Fisher randomization inference with fixed treatment group size) and bootstrap (resampling with replacement). Both methods compute a Monte Carlo p-value as the proportion of resampled test statistics at least as extreme as the observed statistic.

Notes

Randomization inference tests the sharp null hypothesis that all unit-level treatment effects are exactly zero. Under this null, treatment assignment is uninformative about potential outcomes, and permuting treatment labels generates the null distribution of the test statistic.

The permutation method preserves the original number of treated units in each replication, while the bootstrap method may produce varying treatment group sizes. Bootstrap can yield degenerate draws (all treated or all control) which are excluded from p-value computation.

When control variables are specified, the model maintains a fixed centering point (mean of controls for the original treated group) across all replications. This ensures that the randomization distribution reflects variation in treatment assignment rather than in covariate adjustment.

Main Function

lwdid.randomization.randomization_inference(firstpost_df, y_col='ydot_postavg', d_col='d_', ivar='ivar', rireps=1000, seed=None, att_obs=None, ri_method='bootstrap', controls=None)[source]

Perform randomization inference for the null hypothesis of zero treatment effect.

Tests the sharp null hypothesis that all unit-level treatment effects equal zero by resampling treatment labels and computing a Monte Carlo p-value. Supports both classical Fisher permutation inference and bootstrap resampling.

Parameters:
  • firstpost_df (pd.DataFrame) – Cross-sectional data for the first post-treatment period, containing one observation per unit. Must include the transformed outcome, treatment indicator, and unit identifier columns.

  • y_col (str, default 'ydot_postavg') – Name of the column containing the transformed outcome variable (after unit-specific demeaning or detrending).

  • d_col (str, default 'd_') – Name of the column containing the binary treatment indicator (0 or 1).

  • ivar (str, default 'ivar') – Name of the column containing the unit identifier.

  • rireps (int, default 1000) – Number of randomization replications for computing the p-value. Higher values provide more precise p-values but increase computation time.

  • seed (int | None, optional) – Random seed for reproducibility of the randomization distribution. If None, results will vary across runs.

  • att_obs (float | None, optional) – Observed ATT estimate from the original regression. If None, the ATT is re-estimated from the data using OLS.

  • ri_method ({'bootstrap', 'permutation'}, default 'bootstrap') –

    Resampling method for generating the null distribution:

    • ’permutation’: Classical Fisher randomization inference. Permutes treatment labels without replacement, preserving the original number of treated and control units in each replication.

    • ’bootstrap’: Resamples treatment labels with replacement. May produce degenerate draws (all treated or all control) which are excluded.

  • controls (Sequence[str] | None, optional) – List of control variable names to include in the regression model. Controls are included only if sample size conditions are satisfied (N_treated > K+1 and N_control > K+1 where K is the number of controls). The centering point for interactions is fixed at the mean of controls for the original treatment group.

Returns:

Dictionary containing randomization inference results:

  • ’p_value’float

    Two-sided p-value computed as the proportion of replications with abs(ATT_perm) >= abs(ATT_obs).

  • ’ri_method’str

    The resampling method used (‘bootstrap’ or ‘permutation’).

  • ’ri_reps’int

    Total number of replications requested.

  • ’ri_valid’int

    Number of valid (non-degenerate) replications used for p-value.

  • ’ri_failed’int

    Number of failed or degenerate replications.

  • ’ri_failure_rate’float

    Proportion of replications that failed (ri_failed / ri_reps).

Return type:

dict

Raises:

RandomizationError – Raised in the following cases: - rireps is not positive. - Required columns (y_col, d_col) are missing from the data. - ri_method is not ‘bootstrap’ or ‘permutation’. - Sample size is too small (N < 3). - Insufficient valid replications for reliable inference (fewer than 10% of requested replications or minimum of 100 for bootstrap, 10 for permutation).

Warns:

UserWarning – When bootstrap failure rate exceeds 5%, indicating potential issues with extreme treatment proportions or data quality.

See also

estimate_att

Estimates ATT via cross-sectional OLS regression.

lwdid

Main estimation function that can invoke randomization inference.

Notes

The randomization distribution is computed using plain OLS (homoskedastic standard errors) for computational efficiency. The model specification matches the main regression, including control variables if sample size conditions are met.

For the permutation method, every replication is valid since treatment group sizes are preserved. For the bootstrap method, replications that produce degenerate treatment assignment (all N_1=0 or N_1=N) are excluded from the p-value computation, and a warning is issued if the failure rate is substantial.

The theoretical failure rate for bootstrap with treatment proportion p and sample size N is approximately (1-p)^N + p^N, which can be substantial when either treatment or control groups are small.

Resampling Methods

The implementation supports two resampling methods:

Permutation (Classical Fisher Randomization Inference)

Permutes treatment labels without replacement, preserving the original number of treated and control units in each replication. This is the classical Fisher randomization approach and is generally recommended for design-based randomization inference.

Bootstrap (Resampling with Replacement)

Resamples treatment labels with replacement. May produce degenerate draws (all treated or all control) which are excluded from p-value computation. This method is the default for backward compatibility.

Parameters

firstpost_dfpd.DataFrame

Cross-sectional data for the first post-treatment period, containing one observation per unit.

y_colstr, default ‘ydot_postavg’

Column name of the transformed outcome variable.

d_colstr, default 'd_'

Column name of the binary treatment indicator.

ivarstr, default ‘ivar’

Column name of the unit identifier.

rirepsint, default 1000

Number of randomization replications. Higher values provide more precise p-values but increase computation time.

seedint or None, optional

Random seed for reproducibility.

ri_method{‘bootstrap’, ‘permutation’}, default ‘bootstrap’

Resampling method for generating the null distribution.

controlslist of str or None, optional

Control variables to include in the regression model.

Returns

dict

Dictionary containing:

  • p_value: Two-sided p-value (proportion with |ATT_perm| >= |ATT_obs|)

  • ri_method: Resampling method used

  • ri_reps: Total replications requested

  • ri_valid: Number of valid replications

  • ri_failed: Number of failed replications

  • ri_failure_rate: Proportion of failed replications

Example Usage

from lwdid import lwdid

# Estimation with randomization inference
results = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post',
    rolling='detrend',
    ri=True,
    rireps=1000,
    ri_method='permutation',
    seed=42
)

# Access RI results
print(f"RI p-value: {results.ri_pvalue:.4f}")
print(f"Method: {results.ri_method}")
print(f"Valid replications: {results.ri_valid}/{results.rireps}")

Methodological Notes

Randomization inference tests the sharp null hypothesis that all unit-level treatment effects are exactly zero. Under this null, treatment assignment is uninformative about potential outcomes, and permuting treatment labels generates the null distribution of the test statistic.

Advantages

  • Does not rely on normality or homoskedasticity assumptions

  • Naturally accommodates heteroskedasticity and non-normality

  • Provides exact p-values under the randomization model

Limitations

  • Computationally intensive for large numbers of replications

  • Tests only the sharp null hypothesis (zero effect for all units)

  • Bootstrap method may have high failure rates with extreme treatment proportions

See Also

lwdid.lwdid() : Main estimation function with ri=True option. Methodological Notes : Theoretical foundations.