Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design (CVPR 2026)

We revisit GRPO training for visual segmentation and detection and propose Dr. Seg, a simple plug-and-play framework featuring a Look-to-Confirm mechanism and a Distribution-Ranked Reward module. It requires no architectural modifications and integrates seamlessly with existing GRPO-based VLLMs. Extensive experiments show that Dr. Seg improves performance in complex visual scenarios while preserving strong generalization.

Paper: 📖 Dr.Seg
Model: 🤗 Dr.Seg-7B
Dataset: 🤗 COCONut

Overview of Dr. Seg:

TODO

Installation

git clone https://github.com/eVI-group-SCU/Dr-Seg
cd Dr-Seg
conda create -n drseg python=3.12
conda activate drseg
pip install torch==2.6.0 torchvision==0.21.0
pip install -e . --no-build-isolation

Training

Note

We recommend using 4×80GB GPUs and at least 400GB of RAM.
As a reference, it takes approximately 15 hours to run ~500 training steps on 4× H800 PCIe.
Training Data (thanks to VisionReasoner): 🤗 MultiObject-7K

(1) Download the dataset using this script:

python training_scripts/download_dataset.py

(2) Download the pretrained model using the following commands:

mkdir pretrained_models
cd pretrained_models
git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct

(3) Modify trainer.save_checkpoint_path in training_scripts/run_drseg_7b_4x80G.sh:

trainer.save_checkpoint_path=your_path_to_checkpoint/${RUN_NAME}

(4) Start Distribution-Ranked Reward module:

python -u drr_module/serve.py --host 127.0.0.1 --port 50070

Note

Remember to configure Weights & Biases (wandb) correctly to upload training logs.

(5) Start training in another terminal using this script:

bash training_scripts/run_drseg_7b_4x80G.sh

Note

We recommend running training for 400–600 steps.

(6) Merge Checkpoint in Hugging Face Format:

python3 training_scripts/model_merger.py --local_dir [path_to_your_actor_checkpoint]

Evaluation

Note

Evaluation Data (thanks to VisionReasoner):
🤗 ReasonSeg-Val 🤗 ReasonSeg-Test
🤗 refcoco_val 🤗 refcoco_testA
🤗 refcocoplus_val 🤗 refcocoplus_testA
🤗 refcocog_val 🤗 refcocog_testA

(1) Modify REASONING_MODEL_PATH in evaluation_scripts/eval_segmentation_drseg.sh:

REASONING_MODEL_PATH:=your/path/to/checkpoint

(2) (Optional) Modify TEST_DATA_PATH:=Ricky06662/ReasonSeg_val in evaluation_scripts/eval_segmentation_drseg.sh:

REASONING_MODEL_PATH:=datasete/you/want/to/eval

(3)Start evaluation:

bash evaluation_scripts/eval_segmentation_drseg.sh

Citation

@article{sun2026dr,
  title={Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design},
  author={Sun, Haoxiang and Wang, Tao and Tang, Chenwei and Yuan, Li and Lv, Jiancheng},
  journal={arXiv preprint arXiv:2603.00152},
  year={2026}
}

Acknowledgements

This project builds upon several open-source efforts, including VisionReasoner, Seg-Zero, EasyR1, veRL, and COCONut-PanCap. We also utilize pretrained models from Qwen2.5-VL and SAM2. We sincerely thank the authors and maintainers for releasing high-quality code and models, providing clear documentation and reproducible pipelines, and actively maintaining these projects, which significantly facilitated our implementation and evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
drr_module		drr_module
evaluation_scripts		evaluation_scripts
training_scripts		training_scripts
verl		verl
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design (CVPR 2026)

TODO

Installation

Training

Evaluation

Citation

Acknowledgements

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design (CVPR 2026)

TODO

Installation

Training

Evaluation

Citation

Acknowledgements

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages