ML-based particle flow (MLPF) focuses on developing full event reconstruction for particle detectors using computationally scalable and flexible machine learning models. The project aims to improve particle flow reconstruction across various detector environments, including CMS, as well as future detectors via Key4HEP. We build on existing, open-source simulation software by the experimental collaborations.
You can use uv to set up the repo and test that everything works:
git clone --recurse-submodules https://github.com/jpata/particleflow.git
uv sync
uv run ./scripts/local_test_cld.sh
uv run ./scripts/local_test_cms.sh
Alternatively, you can use a prepared container:
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cld.sh
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cms.sh
If you wish to train on pre-made datasets, you can download them from the Hugging Face Hub. To download a specific dataset and split (e.g., CLD, PF setup, configuration split 1):
uv run hf download jpata/particleflow \
--include "tensorflow_datasets/cld/cld_edm_*_pf/1/*" \
--local-dir data/tfds \
--repo-type datasetThis will download the requested files into data/tfds/tensorflow_datasets/cld/cld_edm_*_pf/1/.
Run the training on the downloaded data configuration split
uv run \
python mlpf/pipeline.py \
--spec-file particleflow_spec.yaml \
--production cld \
--model-name pyg-cld-v1 \
--data-dir data/tfds/tensorflow_datasets/cld \
train \
--data_config 1 \
--gpu_batch_multiplier 4 \
--gpus 1
The full data generation, model training, and validation workflow are managed using Pixi for environment and Snakemake for job orchestration. Apptainer images are used to provide the software for the steps for different detetors.
#ensure all gen configs are downloaded
git submodule update --init --recursive
# install pixi, restart your shell or source your .bashrc after this. only do once.
curl -fsSL https://pixi.sh/install.sh | bash
# copy the configuration for your site. only do once.
ln -s configs/{local,tallinn,lxplus}/pixi.toml pixi.toml
# initalize the orhcestrator python environment. only do this once.
pixi run init
# generate the snakefile (will overwrite the defaults)
PROD={cms_run3,clic,cld} pixi run snakefile
# run the steps (this will take many days and thousands of jobs), so run inside screen or tmux
PROD={cms_run3,clic,cld} pixi run gen
PROD={cms_run3,clic,cld} pixi run post
PROD={cms_run3,clic,cld} pixi run tfds
PROD={cms_run3,clic,cld} pixi run trainThe following publications trace the development of MLPF from early proofs of concept to full detector simulations and fine-tuning studies across detectors.
- [2021] First full-event GNN demonstration of MLPF: Paper Code Dataset
- [2021] First demonstration in CMS Run 3: Paper CMS-DP
- [2022] Improved performance in CMS Run 3: CMS-DP
- [2024] Improved performance with full simulation for future colliders: Paper Code Results
- [2025] Fine-tuning across detectors: Paper Code
- [2026] CMS Run 3 full results: Paper CMS-DP Code
You are welcome to reuse the code in accordance with the LICENSE.
How to Cite
- Academic Work: Please cite the specific papers listed in the Publications section above relevant to the method you are using (e.g., initial GNN idea, fine-tuning, or specific detector studies).
- Code Usage: If you use the code significantly for research, please cite the specific tagged version from Zenodo.
- Dataset Usage: Cite the appropriate dataset via the Zenodo link and the corresponding paper.
Contact
For collaboration ideas that do not fit into the categories above, please get in touch via GitHub Discussions.