Skip to content

jpata/particleflow

Repository files navigation

Summary

ML-based particle flow (MLPF) focuses on developing full event reconstruction for particle detectors using computationally scalable and flexible machine learning models. The project aims to improve particle flow reconstruction across various detector environments, including CMS, as well as future detectors via Key4HEP. We build on existing, open-source simulation software by the experimental collaborations.

High-level overview


TLDR; I just want to run the code

You can use uv to set up the repo and test that everything works:

git clone --recurse-submodules https://github.com/jpata/particleflow.git
uv sync
uv run ./scripts/local_test_cld.sh
uv run ./scripts/local_test_cms.sh

Alternatively, you can use a prepared container:

apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cld.sh
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cms.sh

Datasets

If you wish to train on pre-made datasets, you can download them from the Hugging Face Hub. To download a specific dataset and split (e.g., CLD, PF setup, configuration split 1):

uv run hf download jpata/particleflow \
  --include "tensorflow_datasets/cld/cld_edm_*_pf/1/*" \
  --local-dir data/tfds \
  --repo-type dataset

This will download the requested files into data/tfds/tensorflow_datasets/cld/cld_edm_*_pf/1/.

Training

Run the training on the downloaded data configuration split

uv run \
    python mlpf/pipeline.py \
    --spec-file particleflow_spec.yaml \
    --production cld \
    --model-name pyg-cld-v1 \
    --data-dir data/tfds/tensorflow_datasets/cld \
    train \
    --data_config 1 \
    --gpu_batch_multiplier 4 \
    --gpus 1

End-to-end workflow: dataset generation and model training

The full data generation, model training, and validation workflow are managed using Pixi for environment and Snakemake for job orchestration. Apptainer images are used to provide the software for the steps for different detetors.

#ensure all gen configs are downloaded
git submodule update --init --recursive

# install pixi, restart your shell or source your .bashrc after this. only do once.
curl -fsSL https://pixi.sh/install.sh | bash

# copy the configuration for your site. only do once.
ln -s configs/{local,tallinn,lxplus}/pixi.toml pixi.toml

# initalize the orhcestrator python environment. only do this once.
pixi run init

# generate the snakefile (will overwrite the defaults)
PROD={cms_run3,clic,cld} pixi run snakefile

# run the steps (this will take many days and thousands of jobs), so run inside screen or tmux
PROD={cms_run3,clic,cld} pixi run gen
PROD={cms_run3,clic,cld} pixi run post
PROD={cms_run3,clic,cld} pixi run tfds
PROD={cms_run3,clic,cld} pixi run train

Publications

The following publications trace the development of MLPF from early proofs of concept to full detector simulations and fine-tuning studies across detectors.


Citations and Reuse

You are welcome to reuse the code in accordance with the LICENSE.

How to Cite

  1. Academic Work: Please cite the specific papers listed in the Publications section above relevant to the method you are using (e.g., initial GNN idea, fine-tuning, or specific detector studies).
  2. Code Usage: If you use the code significantly for research, please cite the specific tagged version from Zenodo.
  3. Dataset Usage: Cite the appropriate dataset via the Zenodo link and the corresponding paper.

Contact

For collaboration ideas that do not fit into the categories above, please get in touch via GitHub Discussions.

About

Machine-learned, GPU-accelerated particle flow reconstruction

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors