The official implementation for the ACL 2025 paper MPO: Multilingual Safety Alignment via Reward Gap Optimization.
This repository is based on LLaMA-Factory and follow the same requirements and installation procedures.
| Mandatory | Minimum | Recommend |
|---|---|---|
| python | 3.9 | 3.10 |
| torch | 2.0.0 | 2.6.0 |
| torchvision | 0.15.0 | 0.21.0 |
| transformers | 4.45.0 | 4.50.0 |
| datasets | 2.16.0 | 3.2.0 |
| accelerate | 0.34.0 | 1.2.1 |
| peft | 0.14.0 | 0.15.1 |
| trl | 0.8.6 | 0.9.6 |
| Optional | Minimum | Recommend |
|---|---|---|
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.16.4 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.8.2 |
| flash-attn | 2.5.6 | 2.7.2 |
pip install -e ".[torch,metrics]" --no-build-isolationThe data has been placed in the /data directory and registered in data_info.json, including gemma_mpo_data.json, llama_mpo_data.json, and qwen_mpo_data.json, respectively.
Please run the following command to start the training process.
llamafactory-cli train examples/train_mpo/{model}_mpo.yamlmodel = gemma2 / llama3.1 / qwen2.5
If you find our work useful for your research, please kindly cite our paper as follows:
@article{zhao2025mpo,
title={MPO: Multilingual Safety Alignment via Reward Gap Optimization},
author={Zhao, Weixiang and Hu, Yulin and Deng, Yang and Wu, Tongtong and Zhang, Wenxuan and Guo, Jiahe and Zhang, An and Zhao, Yanyan and Qin, Bing and Chua, Tat-Seng and others},
journal={arXiv preprint arXiv:2505.16869},
year={2025}
}
The code of this repository relies on LLaMA-Factory and we would like to show the sincere gratitude to authors of it.