Python library for analyzing data complexity metrics in classification datasets. Wraps PyCol (Python Class Overlap Library) and adds an experiment framework for studying how complexity metrics correlate with ML classifier performance.
- Computes 33+ complexity metrics across four categories: Feature Overlap, Instance Overlap, Structural Overlap, and Multiresolution Overlap
- Provides a modular ML evaluation module (8 classifiers, 17 metrics, cross-validation and train/test evaluators)
- Includes a configurable experiment framework for parameter sweeps over synthetic datasets (Gaussian, Moons, Circles, Blobs)
- Supports parallel execution, result saving/loading, and a range of visualizations
pdm installfrom data_complexity.metrics import complexity_metrics
import numpy as np
dataset = {"X": np.random.randn(200, 2), "y": np.array([0] * 100 + [1] * 100)}
complexity = complexity_metrics(dataset=dataset)
print(complexity.get_all_metrics_scalar())Run a pre-defined experiment:
from data_complexity.experiments.pipeline import run_experiment
exp = run_experiment("moons_noise") # runs, saves plots and CSVsCLAUDE.md— full API reference for contributors and AI assistantsdata_complexity/experiments/pipeline/README.md— detailed experiment framework docs