Skip to content

circle-hit/MPO

Repository files navigation

MPO

The official implementation for the ACL 2025 paper MPO: Multilingual Safety Alignment via Reward Gap Optimization.

venue status

Requirement & Installation

This repository is based on LLaMA-Factory and follow the same requirements and installation procedures.

Mandatory Minimum Recommend
python 3.9 3.10
torch 2.0.0 2.6.0
torchvision 0.15.0 0.21.0
transformers 4.45.0 4.50.0
datasets 2.16.0 3.2.0
accelerate 0.34.0 1.2.1
peft 0.14.0 0.15.1
trl 0.8.6 0.9.6
Optional Minimum Recommend
CUDA 11.6 12.2
deepspeed 0.10.0 0.16.4
bitsandbytes 0.39.0 0.43.1
vllm 0.4.3 0.8.2
flash-attn 2.5.6 2.7.2
pip install -e ".[torch,metrics]" --no-build-isolation

Dataset

The data has been placed in the /data directory and registered in data_info.json, including gemma_mpo_data.json, llama_mpo_data.json, and qwen_mpo_data.json, respectively.

Training

Please run the following command to start the training process.

llamafactory-cli train examples/train_mpo/{model}_mpo.yaml

model = gemma2 / llama3.1 / qwen2.5

Citation

If you find our work useful for your research, please kindly cite our paper as follows:

@article{zhao2025mpo,
  title={MPO: Multilingual Safety Alignment via Reward Gap Optimization},
  author={Zhao, Weixiang and Hu, Yulin and Deng, Yang and Wu, Tongtong and Zhang, Wenxuan and Guo, Jiahe and Zhang, An and Zhao, Yanyan and Qin, Bing and Chua, Tat-Seng and others},
  journal={arXiv preprint arXiv:2505.16869},
  year={2025}
}

Credits

The code of this repository relies on LLaMA-Factory and we would like to show the sincere gratitude to authors of it.

About

Code for ACL 2025 accepted paper titled "MPO: Multilingual Safety Alignment via Reward Gap Optimization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages