GitHub - jpata/particleflow: Machine-learned, GPU-accelerated particle flow reconstruction

Summary

ML-based particle flow (MLPF) focuses on developing full event reconstruction for particle detectors using computationally scalable and flexible machine learning models. The project aims to improve particle flow reconstruction across various detector environments, including CMS, as well as future detectors via Key4HEP. We build on existing, open-source simulation software by the experimental collaborations.

TLDR; I just want to run the code

You can use uv to set up the repo and test that everything works:

git clone --recurse-submodules https://github.com/jpata/particleflow.git
uv sync
uv run ./scripts/local_test_cld.sh
uv run ./scripts/local_test_cms.sh

Alternatively, you can use a prepared container:

apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cld.sh
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cms.sh

Datasets

If you wish to train on pre-made datasets, you can download them from the Hugging Face Hub. To download a specific dataset and split (e.g., CLD, PF setup, configuration split 1):

uv run hf download jpata/particleflow \
  --include "tensorflow_datasets/cld/cld_edm_*_pf/1/*" \
  --local-dir data/tfds \
  --repo-type dataset

This will download the requested files into data/tfds/tensorflow_datasets/cld/cld_edm_*_pf/1/.

Training

Run the training on the downloaded data configuration split

uv run \
    python mlpf/pipeline.py \
    --spec-file particleflow_spec.yaml \
    --production cld \
    --model-name pyg-cld-v1 \
    --data-dir data/tfds/tensorflow_datasets/cld \
    train \
    --data_config 1 \
    --gpu_batch_multiplier 4 \
    --gpus 1

End-to-end workflow: dataset generation and model training

The full data generation, model training, and validation workflow are managed using Pixi for environment and Snakemake for job orchestration. Apptainer images are used to provide the software for the steps for different detetors.

#ensure all gen configs are downloaded
git submodule update --init --recursive

# install pixi, restart your shell or source your .bashrc after this. only do once.
curl -fsSL https://pixi.sh/install.sh | bash

# copy the configuration for your site. only do once.
ln -s configs/{local,tallinn,lxplus}/pixi.toml pixi.toml

# initalize the orhcestrator python environment. only do this once.
pixi run init

# generate the snakefile (will overwrite the defaults)
PROD={cms_run3,clic,cld} pixi run snakefile

# run the steps (this will take many days and thousands of jobs), so run inside screen or tmux
PROD={cms_run3,clic,cld} pixi run gen
PROD={cms_run3,clic,cld} pixi run post
PROD={cms_run3,clic,cld} pixi run tfds
PROD={cms_run3,clic,cld} pixi run train

Publications

The following publications trace the development of MLPF from early proofs of concept to full detector simulations and fine-tuning studies across detectors.

[2021] First full-event GNN demonstration of MLPF: Paper Code Dataset
[2021] First demonstration in CMS Run 3: Paper CMS-DP
[2022] Improved performance in CMS Run 3: CMS-DP
[2024] Improved performance with full simulation for future colliders: Paper Code Results
[2025] Fine-tuning across detectors: Paper Code
[2026] CMS Run 3 full results: Paper CMS-DP Code

Citations and Reuse

You are welcome to reuse the code in accordance with the LICENSE.

How to Cite

Academic Work: Please cite the specific papers listed in the Publications section above relevant to the method you are using (e.g., initial GNN idea, fine-tuning, or specific detector studies).
Code Usage: If you use the code significantly for research, please cite the specific tagged version from Zenodo.
Dataset Usage: Cite the appropriate dataset via the Zenodo link and the corresponding paper.

Contact

For collaboration ideas that do not fit into the categories above, please get in touch via GitHub Discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 1,332 Commits
.github/workflows		.github/workflows
configs		configs
data/cms/run3		data/cms/run3
habana		habana
images		images
mlpf		mlpf
notebooks		notebooks
scripts		scripts
snakemake_jobs		snakemake_jobs
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
particleflow_spec.yaml		particleflow_spec.yaml
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml
validation_cms.yaml		validation_cms.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

TLDR; I just want to run the code

Datasets

Training

End-to-end workflow: dataset generation and model training

Publications

Citations and Reuse

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Summary

TLDR; I just want to run the code

Datasets

Training

End-to-end workflow: dataset generation and model training

Publications

Citations and Reuse

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages