Randomization Inference Module

Hypothesis testing via randomization inference for small and large samples.

This module implements randomization inference (RI) for testing the sharp null hypothesis of no treatment effect in difference-in-differences settings. The approach provides valid p-values without relying on asymptotic distributional assumptions.

Applicability: RI is applicable to both small-sample and large-sample scenarios. While it is particularly valuable for small samples where t-based inference may be unreliable, it also serves as a robust alternative for large samples when normality assumptions are questionable. See Lee and Wooldridge (2026) for discussion of RI in the small-sample context.

Randomization inference for small-sample hypothesis testing.

This module implements randomization inference (RI) for testing the sharp null hypothesis of no treatment effect in difference-in-differences settings. The approach provides valid p-values without relying on asymptotic distributional assumptions, making it particularly suitable for small sample sizes where t-based inference may be unreliable.

The implementation supports two resampling methods: permutation (classical Fisher randomization inference with fixed treatment group size) and bootstrap (resampling with replacement). Both methods compute a Monte Carlo p-value as the proportion of resampled test statistics at least as extreme as the observed statistic.

Notes

Randomization inference tests the sharp null hypothesis that all unit-level treatment effects are exactly zero. Under this null, treatment assignment is uninformative about potential outcomes, and permuting treatment labels generates the null distribution of the test statistic.

The permutation method preserves the original number of treated units in each replication, while the bootstrap method may produce varying treatment group sizes. Bootstrap can yield degenerate draws (all treated or all control) which are excluded from p-value computation.

When control variables are specified, the model maintains a fixed centering point (mean of controls for the original treated group) across all replications. This ensures that the randomization distribution reflects variation in treatment assignment rather than in covariate adjustment.

Main Function

lwdid.randomization.randomization_inference(firstpost_df, y_col='ydot_postavg', d_col='d_', ivar='ivar', rireps=1000, seed=None, att_obs=None, ri_method='bootstrap', controls=None)[source]

Perform randomization inference for the null hypothesis of zero treatment effect.

Tests the sharp null hypothesis that all unit-level treatment effects equal zero by resampling treatment labels and computing a Monte Carlo p-value. Supports both classical Fisher permutation inference and bootstrap resampling.

Parameters:

firstpost_df (pd.DataFrame) – Cross-sectional data for the first post-treatment period, containing one observation per unit. Must include the transformed outcome, treatment indicator, and unit identifier columns.
y_col (str, default 'ydot_postavg') – Name of the column containing the transformed outcome variable (after unit-specific demeaning or detrending).
d_col (str, default 'd_') – Name of the column containing the binary treatment indicator (0 or 1).
ivar (str, default 'ivar') – Name of the column containing the unit identifier.
rireps (int, default 1000) – Number of randomization replications for computing the p-value. Higher values provide more precise p-values but increase computation time.
seed (int | None, optional) – Random seed for reproducibility of the randomization distribution. If None, results will vary across runs.
att_obs (float | None, optional) – Observed ATT estimate from the original regression. If None, the ATT is re-estimated from the data using OLS.
ri_method ({'bootstrap', 'permutation'}, default 'bootstrap') –
Resampling method for generating the null distribution:
- ’permutation’: Classical Fisher randomization inference. Permutes treatment labels without replacement, preserving the original number of treated and control units in each replication.
- ’bootstrap’: Resamples treatment labels with replacement. May produce degenerate draws (all treated or all control) which are excluded.
controls (Sequence[str] | None, optional) – List of control variable names to include in the regression model. Controls are included only if sample size conditions are satisfied (N_treated > K+1 and N_control > K+1 where K is the number of controls). The centering point for interactions is fixed at the mean of controls for the original treatment group.

Returns:

Dictionary containing randomization inference results:

’p_value’float
Two-sided p-value computed as the proportion of replications with abs(ATT_perm) >= abs(ATT_obs).
’ri_method’str
The resampling method used (‘bootstrap’ or ‘permutation’).
’ri_reps’int
Total number of replications requested.
’ri_valid’int
Number of valid (non-degenerate) replications used for p-value.
’ri_failed’int
Number of failed or degenerate replications.
’ri_failure_rate’float
Proportion of replications that failed (ri_failed / ri_reps).

Return type:

dict

Raises:

RandomizationError – Raised in the following cases: - rireps is not positive. - Required columns (y_col, d_col) are missing from the data. - ri_method is not ‘bootstrap’ or ‘permutation’. - Sample size is too small (N < 3). - Insufficient valid replications for reliable inference (fewer than 10% of requested replications or minimum of 100 for bootstrap, 10 for permutation).

Warns:

UserWarning – When bootstrap failure rate exceeds 5%, indicating potential issues with extreme treatment proportions or data quality.

Resampling Methods

The implementation supports two resampling methods:

Permutation (Classical Fisher Randomization Inference)

Permutes treatment labels without replacement, preserving the original number of treated and control units in each replication. This is the classical Fisher randomization approach and is generally recommended for design-based randomization inference.

Bootstrap (Resampling with Replacement)

Resamples treatment labels with replacement. May produce degenerate draws (all treated or all control) which are excluded from p-value computation. This method is the default for backward compatibility.

Parameters

firstpost_dfpd.DataFrame: Cross-sectional data for the first post-treatment period, containing one observation per unit.
y_colstr, default ‘ydot_postavg’: Column name of the transformed outcome variable.
d_colstr, default 'd_': Column name of the binary treatment indicator.
ivarstr, default ‘ivar’: Column name of the unit identifier.
rirepsint, default 1000: Number of randomization replications. Higher values provide more precise p-values but increase computation time.
seedint or None, optional: Random seed for reproducibility.
ri_method{‘bootstrap’, ‘permutation’}, default ‘bootstrap’: Resampling method for generating the null distribution.
controlslist of str or None, optional: Control variables to include in the regression model.

Returns

dict

Dictionary containing:

p_value: Two-sided p-value (proportion with |ATT_perm| >= |ATT_obs|)
ri_method: Resampling method used
ri_reps: Total replications requested
ri_valid: Number of valid replications
ri_failed: Number of failed replications
ri_failure_rate: Proportion of failed replications

Example Usage

from lwdid import lwdid

# Estimation with randomization inference
results = lwdid(
    data,
    y='outcome',
    d='treated',
    ivar='unit',
    tvar='year',
    post='post',
    rolling='detrend',
    ri=True,
    rireps=1000,
    ri_method='permutation',
    seed=42
)

# Access RI results
print(f"RI p-value: {results.ri_pvalue:.4f}")
print(f"Method: {results.ri_method}")
print(f"Valid replications: {results.ri_valid}/{results.rireps}")

Methodological Notes

Randomization inference tests the sharp null hypothesis that all unit-level treatment effects are exactly zero. Under this null, treatment assignment is uninformative about potential outcomes, and permuting treatment labels generates the null distribution of the test statistic.

Advantages

Does not rely on normality or homoskedasticity assumptions
Naturally accommodates heteroskedasticity and non-normality
Provides exact p-values under the randomization model

Limitations

Computationally intensive for large numbers of replications
Tests only the sharp null hypothesis (zero effect for all units)
Bootstrap method may have high failure rates with extreme treatment proportions