data_complexity_analysis

Python library for analyzing data complexity metrics in classification datasets. Wraps PyCol (Python Class Overlap Library) and adds an experiment framework for studying how complexity metrics correlate with ML classifier performance.

What it does

Computes 33+ complexity metrics across four categories: Feature Overlap, Instance Overlap, Structural Overlap, and Multiresolution Overlap
Provides a modular ML evaluation module (8 classifiers, 17 metrics, cross-validation and train/test evaluators)
Includes a configurable experiment framework for parameter sweeps over synthetic datasets (Gaussian, Moons, Circles, Blobs)
Supports parallel execution, result saving/loading, and a range of visualizations

Installation

pdm install

Quick start

from data_complexity.metrics import complexity_metrics
import numpy as np

dataset = {"X": np.random.randn(200, 2), "y": np.array([0] * 100 + [1] * 100)}
complexity = complexity_metrics(dataset=dataset)

print(complexity.get_all_metrics_scalar())

Run a pre-defined experiment:

from data_complexity.experiments.pipeline import run_experiment

exp = run_experiment("moons_noise")   # runs, saves plots and CSVs

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
data_complexity		data_complexity
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
TODO.md		TODO.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data_complexity_analysis

What it does

Installation

Quick start

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

data_complexity_analysis

What it does

Installation

Quick start

Further reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages