The Code Implemantation of the OptiScene (NeurIPS2025).

OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization

Accepted at NeurIPS 2025. The model weight coming soon.

Introduction

This is the official code repository for our NeurIPS 2025 paper OptiScene. We present a novel approach for indoor scene layout generation using Large Language Models (LLMs) through scaled human-aligned data synthesis and multi-stage preference optimization (DPO).

Environment Setup

To set up the required environment, follow these steps:

conda env create -f environment.yml
conda activate optiscene

Training Pipeline

The training process consists of three main stages: SFT, DPO Stage 1, and DPO Stage 2, with a LoRA merge step after each training stage. All tasks are run through the main.py entry point.

1. Supervised Fine-Tuning (SFT)

This is the first stage of training, where the model is fine-tuned on a specific dataset.

# SFT Training
python main.py --task sft \
    --dataset_file dataset/sft_prompts.json \
    --model_name_or_path=Qwen/Qwen2.5-7B-Instruct \
    --bf16 \
    --checkpoint_dir=outputs/Qwen-7B-SFT \
    --per_device_train_batch_size=8 \
    --save_strategy=epoch \
    --epochs=1

After SFT training is complete, merge the LoRA adapter with the base model:

# Merge SFT LoRA
python main.py --task merge \
    --base_model_path Qwen/Qwen2.5-7B-Instruct \
    --lora_path outputs/Qwen-7B-SFT/checkpoint-XXXX \
    --output_path outputs/Qwen-7B-SFT-merged

Note: Replace outputs/Qwen-7B-SFT/checkpoint-XXXX with the actual path to your SFT LoRA checkpoint.

2. DPO Stage 1

In the first DPO stage, the merged SFT model is further trained using preference data.

# DPO Stage 1 Training
python main.py --task dpo \
    --dataset_file dataset/dpo1_prompts.json \
    --model_name_or_path outputs/Qwen-7B-SFT-merged \
    --learning_rate 5.0e-6 \
    --num_train_epochs 10 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --lora_target_modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj \
    --max_length 3200 \
    --max_prompt_length 3200 \
    --gradient_checkpointing \
    --logging_steps 20 \
    --save_steps 100 \
    --eval_strategy "steps" \
    --eval_steps 500 \
    --output_dir outputs/Qwen-7B-DPO-Stage1 \
    --no_remove_unused_columns \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --bf16

After DPO Stage 1 is complete, merge the LoRA adapter:

# Merge DPO Stage 1 LoRA
python main.py --task merge \
    --base_model_path outputs/Qwen-7B-SFT-merged \
    --lora_path outputs/Qwen-7B-DPO-Stage1/checkpoint-XXXX \
    --output_path outputs/Qwen-7B-DPO-Stage1-merged

Note: Replace outputs/Qwen-7B-DPO-Stage1/checkpoint-XXXX with the actual path to your DPO Stage 1 LoRA checkpoint.

3. DPO Stage 2

The second DPO stage continues the training from the merged DPO Stage 1 model.

# DPO Stage 2 Training
python main.py --task dpo \
    --dataset_file dataset/dpo2_prompts.json \
    --model_name_or_path outputs/Qwen-7B-DPO-Stage1-merged \
    --learning_rate 5.0e-6 \
    --num_train_epochs 10 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --lora_target_modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj \
    --max_length 3200 \
    --max_prompt_length 3200 \
    --gradient_checkpointing \
    --logging_steps 20 \
    --save_steps 100 \
    --eval_strategy "steps" \
    --eval_steps 500 \
    --output_dir outputs/Qwen-7B-DPO-Stage2 \
    --no_remove_unused_columns \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --bf16

After DPO Stage 2 is complete, merge the LoRA adapter:

# Merge DPO Stage 2 LoRA
python main.py --task merge \
    --base_model_path outputs/Qwen-7B-DPO-Stage1-merged \
    --lora_path outputs/Qwen-7B-DPO-Stage2/checkpoint-XXXX \
    --output_path outputs/Qwen-7B-DPO-Stage2-merged

Note: Replace outputs/Qwen-7B-DPO-Stage2/checkpoint-XXXX with the actual path to your DPO Stage 2 LoRA checkpoint.

Inference

Once the final model is trained and merged, you can run inference using the following command.

python main.py --task inference --checkpoint_dir outputs/Qwen-7B-DPO-Stage2-merged

Model Weights

We will release the final pre-trained model weights on Hugging Face soon:

Model	Description	Link
OptiScene-Qwen-7.5B	Final model after Two-Stage DPO	Coming Soon

Citation

If you find our work useful, please consider citing:

@inproceedings{yang2025optiscene,
  title={Optiscene: Llm-driven indoor scene layout generation via scaled human-aligned data synthesis and multi-stage preference optimization},
  author={Yang, Yixuan and Luo, Zhen and Ding, Tongsheng and Lu, Junru and Gao, Mingqi and Yang, Jinyu and Sanchez, Victor and Zheng, Feng},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}

License

This project is released under the MIT License.

Acknowledgements

We thank the open-source community for their valuable contributions that made this work possible.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataset		dataset
scripts		scripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
default_prompt.txt		default_prompt.txt
environment.yml		environment.yml
main.py		main.py
prompt.py		prompt.py
response.txt		response.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Code Implemantation of the OptiScene (NeurIPS2025).

Introduction

Environment Setup

Training Pipeline

1. Supervised Fine-Tuning (SFT)

2. DPO Stage 1

3. DPO Stage 2

Inference

Model Weights

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

PolySummit/OptiScene

Folders and files

Latest commit

History

Repository files navigation

The Code Implemantation of the OptiScene (NeurIPS2025).

Introduction

Environment Setup

Training Pipeline

1. Supervised Fine-Tuning (SFT)

2. DPO Stage 1

3. DPO Stage 2

Inference

Model Weights

Citation

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages