2D equal-probability binning for statistical data analysis.
Partitions a 2D dataset into bins that each contain approximately the same number of points. At each step the dimension with the highest variance is split at its median, recursively, until the target number of bins is reached. The resulting bins are axis-aligned rectangles that adapt to the local density of the data.
This implements the multivariate probability binning algorithm described in:
Roederer, M., Moore, W., Treister, A., Hardy, R. R. & Herzenberg, L. A. (2001). Probability binning comparison: a metric for quantitating multivariate distribution differences. Cytometry 45(1):47–55. https://doi.org/10.1002/1097-0320(20010901)45:1<47::AID-CYTO1143>3.0.CO;2-A
pip install equibin
import numpy as np
from equibin import bin_2d, plot_bins, save_bins
rng = np.random.default_rng(0)
x = rng.uniform(0, 10, 5000)
y = rng.uniform(0, 10, 5000)
result = bin_2d(x, y, n_bins=128)
print(len(result)) # 128
print(result.counts.sum()) # 5000
print(result.bins[0]) # (xmin, xmax, ymin, ymax)Restrict binning to a region of interest:
result = bin_2d(x, y, n_bins=256, xmin=2.5, xmax=20, ymin=0.25, ymax=10)Plot the bins overlaid on the data:
plot_bins(result, x, y, title="Equal-probability bins", xlim=(0, 10), ylim=(0, 10))Save bin boundaries to a text file (one line per bin: xlo xhi ylo yhi label):
save_bins(result, "bins.txt", label_prefix="run_")- Amy Roberts (amy.roberts@ucdenver.edu)
- Lekhraj Pandey (lekhraj.pandey@coyotes.usd.edu)
- Anthony Villano (anthony.villano@ucdenver.edu)
GPL-2.0-only