OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Accepted at NeurIPS 2025. The model weight coming soon.
This is the official code repository for our NeurIPS 2025 paper OptiScene. We present a novel approach for indoor scene layout generation using Large Language Models (LLMs) through scaled human-aligned data synthesis and multi-stage preference optimization (DPO).
To set up the required environment, follow these steps:
conda env create -f environment.yml
conda activate optisceneThe training process consists of three main stages: SFT, DPO Stage 1, and DPO Stage 2, with a LoRA merge step after each training stage. All tasks are run through the main.py entry point.
This is the first stage of training, where the model is fine-tuned on a specific dataset.
# SFT Training
python main.py --task sft \
--dataset_file dataset/sft_prompts.json \
--model_name_or_path=Qwen/Qwen2.5-7B-Instruct \
--bf16 \
--checkpoint_dir=outputs/Qwen-7B-SFT \
--per_device_train_batch_size=8 \
--save_strategy=epoch \
--epochs=1After SFT training is complete, merge the LoRA adapter with the base model:
# Merge SFT LoRA
python main.py --task merge \
--base_model_path Qwen/Qwen2.5-7B-Instruct \
--lora_path outputs/Qwen-7B-SFT/checkpoint-XXXX \
--output_path outputs/Qwen-7B-SFT-mergedNote: Replace outputs/Qwen-7B-SFT/checkpoint-XXXX with the actual path to your SFT LoRA checkpoint.
In the first DPO stage, the merged SFT model is further trained using preference data.
# DPO Stage 1 Training
python main.py --task dpo \
--dataset_file dataset/dpo1_prompts.json \
--model_name_or_path outputs/Qwen-7B-SFT-merged \
--learning_rate 5.0e-6 \
--num_train_epochs 10 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 4 \
--lora_target_modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj \
--max_length 3200 \
--max_prompt_length 3200 \
--gradient_checkpointing \
--logging_steps 20 \
--save_steps 100 \
--eval_strategy "steps" \
--eval_steps 500 \
--output_dir outputs/Qwen-7B-DPO-Stage1 \
--no_remove_unused_columns \
--use_peft \
--lora_r 32 \
--lora_alpha 16 \
--bf16After DPO Stage 1 is complete, merge the LoRA adapter:
# Merge DPO Stage 1 LoRA
python main.py --task merge \
--base_model_path outputs/Qwen-7B-SFT-merged \
--lora_path outputs/Qwen-7B-DPO-Stage1/checkpoint-XXXX \
--output_path outputs/Qwen-7B-DPO-Stage1-mergedNote: Replace outputs/Qwen-7B-DPO-Stage1/checkpoint-XXXX with the actual path to your DPO Stage 1 LoRA checkpoint.
The second DPO stage continues the training from the merged DPO Stage 1 model.
# DPO Stage 2 Training
python main.py --task dpo \
--dataset_file dataset/dpo2_prompts.json \
--model_name_or_path outputs/Qwen-7B-DPO-Stage1-merged \
--learning_rate 5.0e-6 \
--num_train_epochs 10 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 4 \
--lora_target_modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj \
--max_length 3200 \
--max_prompt_length 3200 \
--gradient_checkpointing \
--logging_steps 20 \
--save_steps 100 \
--eval_strategy "steps" \
--eval_steps 500 \
--output_dir outputs/Qwen-7B-DPO-Stage2 \
--no_remove_unused_columns \
--use_peft \
--lora_r 32 \
--lora_alpha 16 \
--bf16After DPO Stage 2 is complete, merge the LoRA adapter:
# Merge DPO Stage 2 LoRA
python main.py --task merge \
--base_model_path outputs/Qwen-7B-DPO-Stage1-merged \
--lora_path outputs/Qwen-7B-DPO-Stage2/checkpoint-XXXX \
--output_path outputs/Qwen-7B-DPO-Stage2-mergedNote: Replace outputs/Qwen-7B-DPO-Stage2/checkpoint-XXXX with the actual path to your DPO Stage 2 LoRA checkpoint.
Once the final model is trained and merged, you can run inference using the following command.
python main.py --task inference --checkpoint_dir outputs/Qwen-7B-DPO-Stage2-mergedWe will release the final pre-trained model weights on Hugging Face soon:
| Model | Description | Link |
|---|---|---|
| OptiScene-Qwen-7.5B | Final model after Two-Stage DPO | Coming Soon |
If you find our work useful, please consider citing:
@inproceedings{yang2025optiscene,
title={Optiscene: Llm-driven indoor scene layout generation via scaled human-aligned data synthesis and multi-stage preference optimization},
author={Yang, Yixuan and Luo, Zhen and Ding, Tongsheng and Lu, Junru and Gao, Mingqi and Yang, Jinyu and Sanchez, Victor and Zheng, Feng},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}This project is released under the MIT License.
We thank the open-source community for their valuable contributions that made this work possible.