SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning
- SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning
Note
- Check out our recent work CBF-Informed MARL, which has been accepted for publication at IEEE ITSC 2026! We propose a Control Barrier Function (CBF)-informed reward design for Multi-Agent RL (MARL) that converts CBF constraint values under joint MARL actions into a reward signal that explicitly guides safe learning (see also Fig. 6).
- Check out our recent work CBF-Based Safety Filter, which has been nominated for best paper award at IEEE ITSC 2025! It proposes a real-time CBF-based safety filter for safety verification of learning-based motion planning with road boundary constraints (see also Fig. 5).
This repository provides the full code of SigmaRL, a Sample-efficient and generalizable multi-agent Reinforcement Learning (MARL) framework for motion planning of Connected and Automated Vehicles (CAVs).
SigmaRL is a decentralized MARL framework designed for motion planning of CAVs. We use VMAS, a vectorized differentiable simulator designed for efficient MARL benchmarking, as our simulator and customize it for our RL environment. The first scenario in Fig. 1 mirrors the real-world conditions of our Cyber-Physical Mobility Lab (CPM Lab). We also support maps handcrafted in JOSM, an open-source editor for OpenStreetMap. Below you will find detailed guidance for creating your OWN maps.
(a) CPM scenario. |
(b) Intersection scenario. |
(c) On-ramp scenario. |
(d) "Roundabout" scenario. |
Figure 1: Demonstrating the generalization of SigmaRL (speed x2). Only the intersection part of the CPM scenario (the middle part in Fig. 1(a)) is used for training. All other scenarios are completely unseen. See our SigmaRL paper for more details.
Figure 2: We use an auxiliary MARL to learn dynamic priority assignments to address non-stationarity. Higher-priority agents communicate their actions (depicted by the colored lines) to lower-priority agents to stabilize the environment. See our XP-MARL paper for more details.
Figure 3: Demonstrating the safety and reduced conservatism of our MTV-based safety margin. In the overtaking scenario, while the traditional approach fails to overtake due to excessive conservatism (see (a)), ours succeeds (see (b)). Note that in the overtaking scenario, the slow-moving vehicle
(a) The standard HOCBF approach requires tuning two parameters (lambda_1 and lambda_2). |
(b) Our TTCBF HOCBF approach requires tuning only one parameter (lambda_1). |
Figure 4: Our TTCBF approach reduces the number of parameters to tune when handling constraints with high relative degrees. See our TTCBF paper for more details.
(a) An undertrained RL policy without our safety filter often caused collisions with road boundaries. |
(b) Our safety filter successfully avoided all collisions caused by the undertrained RL policy. |
Figure 5: Demonstration of our safety filter for safety verification of an undertrained RL policy. See our CBF-Based Safety Filter Paper for more details.
(a) Demo 1. |
(b) Demo 2. |
(c) Demo 3. |
(d) Demo 4. |
Figure 6: Demonstrating some representative interaction scenarios using the policy learned with our proposed CBF-informed method. See our CBF-Informed MARL for more details.
SigmaRL supports Python 3.10 to 3.12 and is OS-independent. We recommend using a virtual environment. For example, with conda:
conda create -n env_sigmarl python=3.12
conda activate env_sigmarlClone the repository and install SigmaRL in editable mode:
git clone https://github.com/bassamlab/SigmaRL.git
cd SigmaRL
pip install -r requirements.txt
pip install -e .If you plan to contribute to the repository, install the development dependencies as well:
pip install -e ".[dev]"
pre-commit installLaunch Python from the terminal:
pythonThen run:
import sigmarl
print(sigmarl.__version__)If the installation succeeds, Python will print the installed sigmarl version.
Run main_training.py. During training, any intermediate model that outperforms the currently saved model will be saved automatically. You can also retrain or refine a trained model by setting the parameter is_continue_train in sigmarl/config.json to true. The saved model will be loaded for a new training process.
sigmarl/scenarios/road_traffic.py defines the RL environment, including the observation and reward functions. It also provides an interactive interface that visualizes the environment. To open the interface, simply run this file. You can use the arrow keys to control agents and the Tab key to switch between agents. Adjust the parameter scenario_type to choose a scenario. All available scenarios are listed in the variable SCENARIOS in sigmarl/constants.py. Before training, we recommend using the interactive interface to check whether the environment behaves as expected.
After training, run main_testing.py to test your model. You may need to adjust the parameter path there to specify which folder contains the target model.
Note: If the path to a saved model changes, you need to update the value of where_to_save in the corresponding JSON file as well.
We support maps customized in JOSM, an open-source editor for OpenStreetMap. Follow these steps (video tutorial available here):
- Install JOSM from the website given above.
- To get an empty map that can be customized, do the following:
- Open JOSM and click the green download button
- Zoom in and choose an arbitrary place on the map by drawing a rectangle. The area should be as empty as possible.
- Clicking "Download" will open a new window. There should be a notification that no data could be found; otherwise, choose the area again.
- Customize the map by drawing lines. Note that all lanes you draw are considered center lines. You do not need to draw left and right boundaries, since they will be determined automatically later by our script with a given width. The distance between the nodes of a lane should be approximately 0.1 meters. You can find useful hints and commands for customizing the map at Actions and Tools.
- Give each lane the key "lanes" and a unique value.
- Save the resulting .osm file and store it at
assets/maps. Give it a name. - Go to
utilities/constants.pyand create a new entry in the dictionary "SCENARIOS" for it. The key of the entry is the name of the map and the value is a dictionary, for which you should at least provide values for the keysmap_path,lane_width, andscale. You should also provide a list forreference_paths_ids(which paths exist?) and a dictionary forneighboring_lanelet_ids(which lanes are adjacent?). - Go to
utilities/parse_osm.py. Adjust the parametersscenario_typeand run it.
Figure 7: Overview of currently available maps.
If you use this repository, we would greatly appreciate it if you consider selectively citing our papers below.
-
BibTeX
@inproceedings{xu2024sigmarl, title = {SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning}, booktitle = {2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)}, author = {Xu, Jianye and Hu, Pan and Alrifaee, Bassam}, year = {2024}, pages = {768--775}, issn = {2153-0017}, doi = {10.1109/ITSC58415.2024.10919918} }
-
Reproduce Experimental Results in the Paper:
- Check out the corresponding tag using
git checkout 1.2.0 - Go to this page and download the zip file
itsc24.zip. Unzip it, copy and paste the whole folder to thecheckpointsfolder at the root of this repository. The structure should be like this:root/checkpoints/itsc24/. - Run
sigmarl/evaluation_itsc24.py.
You can also run
testing_mappo_cavs.pyto intuitively evaluate the trained models. Adjust the parameterpaththere to specify which folder contains the target model. Note: The evaluation results you get may deviate from the paper since we have meticulously adjusted the performance metrics. - Check out the corresponding tag using
-
BibTeX
@article{xu2024xp, title={{{XP-MARL}}: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity}, author={Xu, Jianye and Sobhy, Omar and Alrifaee, Bassam}, journal={arXiv preprint arXiv:2409.11852}, year={2024}, }
-
Reproduce Experimental Results in the Paper:
- Check out the corresponding tag using
git checkout 1.2.0 - Go to this page and download the zip file
icra25.zip. Unzip it, copy and paste the whole folder to thecheckpointsfolder at the root of this repository. The structure should be like this:root/checkpoints/icra25/. - Run
sigmarl/evaluation_icra25.py.
You can also run
testing_mappo_cavs.pyto intuitively evaluate the trained models. Adjust the parameterpaththere to specify which folder contains the target model. - Check out the corresponding tag using
-
BibTeX
@inproceedings{xu2025learningbased, title = {A Learning-Based Control Barrier Function for Car-Like Robots: Toward Less Conservative Collision Avoidance}, booktitle = {2025 European Control Conference (ECC)}, author = {Xu, Jianye and Alrifaee, Bassam}, year = 2025, pages = {988--995}, doi = {10.23919/ECC65951.2025.11187043} }
-
Reproduce Experimental Results in the Paper:
- Go to this page and download the zip file
ecc25.zip. Unzip it, copy and paste the whole folder to thecheckpointsfolder at the root of this repository. The structure should be like this:root/checkpoints/ecc25/. - Run
sigmarl/evaluation_ecc25.py.
- Go to this page and download the zip file
-
BibTeX
@article{xu2025highorder, title = {High-Order Control Barrier Functions: Insights and a Truncated Taylor-Based Formulation}, author = {Xu, Jianye and Alrifaee, Bassam}, journal = {arXiv preprint arXiv:2503.15014}, year = {2025}, }
-
Reproduce Experimental Results in the Paper:
- Check out the corresponding tag using
git checkout 1.3.0 - Run
sigmarl/hocbf_taylor.py.
- Check out the corresponding tag using
-
BibTeX
@inproceedings{xu2025realtime, title = {A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints}, booktitle = {2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC)}, author = {Xu, Jianye and Che, Chang and Alrifaee, Bassam}, year = 2025, pages = {2818--2825}, doi = {10.1109/ITSC60802.2025.11423203} }
-
Reproduce Experimental Results in the Paper:
- Check out the corresponding tag using
git checkout 1.4.0 - Go to this page and download the zip file
itsc25.zip. Unzip it, copy and paste the whole folder to thecheckpointsfolder at the root of this repository. The structure should be like this:root/checkpoints/itsc25/. - Run
sigmarl/evaluation_itsc25.py.
- Check out the corresponding tag using
-
BibTeX
@article{beerwerth2026zeroshot, title = {Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab}, author = {Beerwerth, Julius and Xu, Jianye and Sch{\"a}fer, Simon and Belderink, Fynn and Alrifaee, Bassam}, year = 2026, journal = {at - Automatisierungstechnik}, volume = {74}, number = {5}, pages = {376--385}, publisher = {De Gruyter}, doi = {10.1515/auto-2025-0057}, copyright = {De Gruyter expressly reserves the right to use all content for commercial text and data mining within the meaning of Section 44b of the German Copyright Act.} }
-
Reproduce Experimental Results of the SigmaRL Simulation in the Paper:
- Check out the corresponding tag using
git checkout 1.5.0 - Go to this page and download the zip file
at25.zip. Unzip it, copy and paste the whole folder to thecheckpointsfolder at the root of this repository. The structure should be like this:root/checkpoints/at25/. - Run
sigmarl/eva_at25/run_models_parallel.pyto evaluate the downloaded models. The evaluation results will be saved automatically.- This script requires Python parallel workers.
- Alternatively, you can run
sigmarl/eva_at25/run_models.pyif you do not want to use parallel workers.
- After the evaluation, run
sigmarl/eva_at25/marl_aggregated_evaluation.pyto analyze the evaluation results and obtain the performance metrics.
- Check out the corresponding tag using
-
See additional documentation here.
-
BibTeX
@inproceedings{xu2026beyond, title = {Beyond Safety Filtering: Control Barrier Function-Informed Reinforcement Learning for Connected and Automated Vehicles}, booktitle = {2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC), in press}, author = {Xu, Jianye and Alrifaee, Bassam}, year = {2026}, }
-
Reproduce Experimental Results in the Paper:
- Check out the corresponding tag using
git checkout 1.6.0 - Go to this page and download the zip file
itsc26.zip. Unzip it, copy and paste the whole folder to thecheckpointsfolder at the root of this repository. The structure should be like this:root/checkpoints/itsc26/. - Run
sigmarl/evaluation_itsc26.py.
- Check out the corresponding tag using
- Improve safety
- Integrating Control Barrier Functions (CBFs)
- Proof of concept with two agents (see the MTV-Based CBF paper here)
- High-Order CBFs (see the TTCBF paper here)
- CBF-certified safe RL for collision avoidance with road boundaries (see the CBF-Based Safety Filter paper here)
- CBF-informed reward design for safe MARL at an intersection (see the CBF-Informed MARL paper here)
- Integrating Model Predictive Control (MPC)
- Integrating Control Barrier Functions (CBFs)
- Address non-stationarity
- Integrating prioritization (see the XP-MARL paper here)
- Misc
- OpenStreetMap support (see guidance here)
- Contribute our CPM scenario as a MARL benchmark scenario in VMAS (see news here)
- Update to the latest versions of Torch, TorchRL, and VMAS
- Support Python 3.11+
- Deploy in the real world (see the CPM Lab Benchmark paper here)
- Consider heterogeneous agents
This research was supported by the Bundesministerium für Digitales und Verkehr (German Federal Ministry for Digital and Transport) within the project "Harmonizing Mobility" (grant number 19FS2035A).

















