Skip to content

shrutij01/compositional-sparse-ood

Repository files navigation

Compositional Sparse OOD

Code for "Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation".

arXiv Docs License: MIT Python 3.10+

Vitoria Barin Pacela*, Shruti Joshi*, Isabela Camacho, Simon Lacoste-Julien, David Klindt

What This Does

Under superposition, concepts are linearly represented in neural network activations but not linearly accessible. We compare sparse coding, SAEs, and linear probes on compositional OOD generalisation.

Method OOD generalisation Why
FISTA (oracle dictionary) Near-perfect at all scales Per-sample sparse inference solves the right problem
SAEs (ReLU, TopK, JumpReLU) Fail Learned dictionaries point in wrong directions
Linear probes Degrade sharply MCC < 0.1 at d_z = 10,000 under superposition
DL-FISTA Beats probes, fails at scale Dictionary learning is the bottleneck

Install

uv venv && source .venv/bin/activate
uv pip install -e .

Or with pip:

pip install -e .

Requires Python >= 3.10 and PyTorch.

Reproduce Paper Figures

All results are pre-computed in results/. To regenerate:

# Main text + appendix figures
python experiments/plotting/plot_paper_figures.py --only v2

# Controlled experiment figures
python experiments/plotting/plot_controlled.py

Figures are saved to paper_figures/.

Run Experiments

Sensitivity — vary one parameter, hold others fixed:

python experiments/sensitivity/exp_vary_latents.py
python experiments/sensitivity/exp_vary_samples.py
python experiments/sensitivity/exp_vary_sparsity.py

Controlled — decompose SAE failure:

python experiments/controlled/exp_dict_quality.py       # FISTA on SAE-learned dictionary
python experiments/controlled/exp_warmstart_decoder.py   # SAE decoder as DL-FISTA init
python experiments/controlled/exp_support_recovery.py    # Sparsity pattern recovery
python experiments/controlled/exp_learning_dynamics.py   # Dictionary learning over time
python experiments/controlled/exp_lambda_sensitivity.py  # Regularisation sweep

Results are saved incrementally to results/.

Structure

src/data.py              # Synthetic data generation (sparse codes + linear mixing)
models/
├── saes.py              # SAE variants (ReLU, TopK, JumpReLU, MP)
├── sparse_coding.py     # FISTA, DL-FISTA, Softplus-Adam, LISTA
└── linear_probe.py      # Supervised linear probe baseline
utils/metrics.py         # MCC, accuracy, AUC, support recovery metrics
experiments/
├── _common.py           # Shared training/evaluation helpers
├── sensitivity/         # Vary latents, samples, sparsity
├── controlled/          # Frozen decoder, warmstart, dict quality, etc.
└── plotting/            # Figure generation scripts
results/                 # Pre-computed experiment results (JSON)
paper_figures/           # Generated figures

Citation

@misc{pacela2026stopprobingstartcoding,
  title={Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation},
  author={Vitória Barin Pacela and Shruti Joshi and Isabela Camacho and Simon Lacoste-Julien and David Klindt},
  year={2026},
  eprint={2603.28744},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2603.28744},
}

License

MIT

About

Code for "Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors