Balancing Representations by Identifying and Generating Underrepresented Data

Anonymous code release for the paper:

BRIDGE: Balancing Representations by Identifying and Generating Underrepresented Data

This repository contains the training code, experiment configurations, job scripts, and paper assets used for the BRIDGE experiments.

Overview

BRIDGE is a self-supervised learning pipeline that alternates between:

SSL pretraining on the current source dataset
sparsity scoring in representation space with mean $k$NN distance
targeted image-to-image augmentation of underrepresented regions

The repository supports:

baseline runs on balanced and imbalanced sources
BRIDGE runs with SimCLR, TS, and SDCLR
ablations over SSL objective, generation method, selection rule, cycle count, and architecture
source-regime sweeps on ImageNet-100-LT, CIFAR-10-LT, CIFAR-100-LT, PASS, and DiffusionDB

Repository Structure

experiment/                  Core training, datasets, models, and evaluation code
experiment/conf/             Hydra configuration files
jobs/                        SLURM job scripts for baselines, ablations, and source-regime sweeps
scripts/                     Utility scripts for log checking, result export, and paper table generation
paper_work/                  Manuscript sources, generated tables, and figures
job_logs/                    Collected SLURM logs from completed runs
outputs/                     Experiment outputs and checkpoints
visualizations/              Auxiliary visualizations

Environment

The project expects a Python environment with the packages in:

requirements.txt
environment.yml

The codebase uses Hydra for configuration management and is designed to run both locally and through SLURM job arrays.

Running Experiments

The main entry point is:

python -m experiment

Typical arguments include:

model_name={ResNet18,ResNet50,ViTSmall,ViTBase}
ssl_method={SimCLR,SDCLR,MoCo,DINO}
dataset=...
pretrain=true
finetune=true
logger=true
num_runs=3
seed=...

For SLURM arrays, each task runs one seed through SLURM_ARRAY_TASK_ID. After all seed jobs finish, aggregate with:

python -m experiment aggregate_only=true ...

using the same experiment name and configuration.

Reproducing Paper Results

The repository includes:

experiment configs under experiment/conf/
job scripts under jobs/
log summaries under job_logs/
result-export and paper-table utilities under scripts/

The manuscript in paper_work/neurips_bridge_paper/ is generated from the completed three-seed runs reported in the paper.

Notes

PASS and DiffusionDB source-regime comparisons use aligned 10k source subsets.
The main reported BRIDGE configuration uses ResNet-50, 5 cycles, 500 selected samples per cycle, and 5 generated images per selected sample.
The DINO ablation uses four GPUs and two local crops.

Name		Name	Last commit message	Last commit date
Latest commit History 1,157 Commits
experiment		experiment
jobs		jobs
paper_work		paper_work
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
BRIDGE.pdf		BRIDGE.pdf
BRIDGE_ICLR_2026.zip		BRIDGE_ICLR_2026.zip
Neurips 2026.zip		Neurips 2026.zip
README.md		README.md
bradd-cvpr.zip		bradd-cvpr.zip
class_dists.pdf		class_dists.pdf
environment.yml		environment.yml
generate_tables_from_logs.py		generate_tables_from_logs.py
generated_samples_overview.png		generated_samples_overview.png
plot_class_dist.py		plot_class_dist.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Balancing Representations by Identifying and Generating Underrepresented Data

Overview

Repository Structure

Environment

Running Experiments

Reproducing Paper Results

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Balancing Representations by Identifying and Generating Underrepresented Data

Overview

Repository Structure

Environment

Running Experiments

Reproducing Paper Results

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages