How FastPval Computes Extremely Low P‑Values (up to 1e‑9) — Method & Performance
FastPval is a method and implementation designed to estimate extremely small empirical p-values efficiently when using permutation, bootstrap, or other resampling approaches. Instead of storing or sorting enormous resampling distributions, FastPval combines clever counting, partitioning, and interpolation to produce accurate tail probabilities (down to ~1e‑9) with far lower time and memory costs. This article outlines the core ideas, algorithmic steps, accuracy considerations, and real-world performance characteristics.
Problem overview
- Resampling-based p-values estimate the probability that a test statistic Tfrom the null resampling distribution is at least as extreme as the observed statistic Tobs.
- To estimate p-values as small as 1e‑9 directly by brute force requires ~1e9 resamples — infeasible in time and memory for many workflows.
- FastPval addresses this by strategically sampling and summarizing the tail of the null distribution rather than enumerating every resample.
Key ideas behind FastPval
- Stratified tail sampling: Focus computational effort on the upper tail where extreme values occur. Most resamples fall far from Tobs and provide little information about tiny p-values; FastPval oversamples regions near and above Tobs.
- Histogram/quantile sketching: Build compact summaries (e.g., binned counts or quantile sketches) of the resampling distribution so you can estimate counts above thresholds without storing all samples.
- Importance sampling / weighted resampling: If available for the problem, draw resamples from an importance distribution that yields more tail samples, and correct via weights to retain unbiasedness.
- Extrapolation via tail modeling: Fit a parametric tail (e.g., generalized Pareto distribution, exponential, or other heavy‑tail models) to the extreme portion of sampled data and extrapolate beyond observed resamples to estimate p-values smaller than 1/Nsample.
- Two‑stage procedure: Use a fast preliminary pass to identify a threshold region, then run an intensified sampling or tail fit in that region to refine the p-value estimate with controlled error.
Algorithmic outline (practical version)
- Run an initial modest number of resamples (e.g., 1e5–1e6) under the null to produce a sample of T.
- Compute a high quantile q0 (e.g., 99.9% or 99.99%) from these samples to locate the extreme region.
- From the initial samples, collect all T ≥ q0 and fit a tail model (GPD or exponential) to those exceedances.
- Option A — extrapolation: Use the fitted tail to extrapolate the survival function S(t) = P(T* ≥ t) at Tobs. This yields p̂ and an uncertainty estimate from the fit.
- Optionally combine with the empirical survival below the fit threshold for a full distribution.
- Option B — targeted resampling: Conduct additional resampling focused on the tail (importance sampling or conditional resampling above q0). Use importance weights or conditional probability formulas to merge new samples with the initial set and compute a refined p̂.
- Report p̂ with confidence interval or standard error derived from bootstrap of the fitted tail parameters or from importance weight variance formulas.
Accuracy and error control
- Extrapolation accuracy depends on: (a) the tail model validity, (b) number of exceedances used for fitting, and © distance between q0 and Tobs. Use diagnostic plots (QQ, PP, mean residual life) to check fit.
- For target p-values around 1e‑9, you typically need:
- a good parametric tail fit, and
- enough exceedances (hundreds to thousands) to estimate tail parameters reliably.
- Provide uncertainty quantification: profile likelihood, parametric bootstrap of fitted tail, or delta‑method on tail parameter estimates. Report p̂ and a confidence interval or upper bound when extrapolation uncertainty is large.
- If importance sampling is used, monitor effective sample size and variance of importance weights; resampling inefficiency inflates estimator variance.
Performance considerations
- Memory: FastPval stores only the initial sample, exceedances, and compact summaries rather than billions of resamples — reducing memory by orders of magnitude.
- Time: Most cost is in generating resamples; targeted tail sampling or importance sampling concentrates draws where they matter, often reducing total required draws by factors of 10–100 or more versus plain Monte Carlo for rare events.
- Parallelism: Resampling generation and tail fitting scale well across CPU cores or distributed workers.
- Implementation choices affect speed:
- Lightweight summaries (histograms, streaming quantile algorithms) are faster and use less RAM.
- Complex importance sampling schemes require careful tuning but may drastically improve rare‑event efficiency.
Example workflows
- Quick estimate (moderate reliability):
- 1e6 initial resamples → compute 99.99% quantile → fit exponential tail to exceedances → extrapolate p̂.
- High‑confidence estimate (stronger reliability for p ≤ 1e‑9):
- 1e6 initial resamples → identify q0 at 99.9% → run targeted importance resampling focused above q0 to gather ~1e4 exceedances → fit GPD → estimate p̂ and CI via bootstrap.
- Pure importance sampling (when an effective importance distribution is known):
- Design importance distribution biased to produce extremes, sample with weights, compute weighted tail probability; repeat tuning for acceptable effective sample size.
Practical diagnostics (what to check)
- Stability of p̂ as you vary q0 and the number of exceedances used for fitting.
- Goodness of fit of the tail model (QQ plot of exceedances vs. fitted distribution; Kolmogorov–Smirnov or likelihood ratio tests for nested tail models).
- Effective sample size and weight variance if using importance sampling.
- Sensitivity of p̂ to plausible alternative tail models (e.g., exponential vs. GPD).
When not to extrapolate
- If tail behavior is irregular or multimodal near Tobs, parametric tail extrapolation may be unreliable. In such cases, prefer targeted resampling until the empirical tail contains enough observations to estimate the desired p-value directly.
- If the test statistic distribution under the null is known analytically, use that analytic null instead of resampling.
Example performance numbers (typical)
- Brute‑force Monte Carlo to get p ≈ 1e‑9 with relative error ~20%: O(1e9) resamples — often impossible.
- FastPval-style: 1e6 initial resamples + targeted tail sampling (1e4–1e5 effective tail samples) + GPD fit → practical p̂ ~1e‑9 with quantified uncertainty, running in minutes to hours depending on per‑sample cost.
- Memory: storing 1e6 samples (8 bytes each) ≈ 8 MB — trivial compared to storing 1e9 samples (~8 GB).
Summary
FastPval achieves estimates of extremely small p-values by concentrating computation on the distribution tail through stratified sampling, compact summaries, importance sampling, and parametric tail fitting. Proper diagnostics and uncertainty quantification are essential: extrapolation enables reaching p-values like 1e‑9 with feasible computation, but the reliability depends on tail model validity and sufficient extreme observations. For the most robust results, combine initial sampling, targeted tail draws, and bootstrap diagnostics to produce a p-value estimate with an honest confidence interval.