Paper | Project Page | Model | Datasets | Interactive Demo | Citation
FoundationMotion offers a scalable way to curate detailed motion datasets, enabling effective fine-tuning of diverse models (VLM / VLA / World Models) to improve motion and spatial reasoning.
If you want to construct datasets using our dataset curation pipeline: see installation instructions in data_pipeline/README.md
If you want to use our finetuned model:
pip install fire tqdm huggingface-hub
pip install -U https://github.com/NVlabs/VILAFollow the instructions in data_pipeline/README.md to set up the video files you want to process. Customize your paths and settings in data_pipeline/scripts/config.sh.
Run:
bash data_pipeline/scripts/submit_ranges.shThis will start processing video data. Modify submit_range 0 60 to specify the range of videos to process — 0 is the starting index and 60 is the ending index. You can submit multiple jobs with different or even overlapping ranges; we handled all the rest for you. Just submit your jobs and adjust the start/end values as needed.
python eval/vila_motionbench.py \
--task="robotics_hand_eval" \
--base_dir="~/workspace/v2-dev" \
--model_path="WoWolf/nvila_15b_video-fm-tuned"Full Huggingface Demo (this demo is also hosted on Huggingface Spaces)
Run the demo:
python app.pyDrag a video, ask a question, and get an ansewer.
- data_pipeline/process_single_video.py - script to process a single video to get trajectories, captions, and question–answer pairs.
python process_single_video.py --video_path /path/to/video.mp4 --base_output_dir /path/to/output- examples/demo_nvila.py - script to process a single video using our model.
python demo_nvila.py --video_path /path/to/video.mp4 --prompt "Your question here"If you use our work or our implementation in this repo, or find them helpful, please consider giving a citation in the following format.
@misc{gan2025foundationmotion,
title={FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos},
author={Yulu Gan and Ligeng Zhu and Dandan Shan and Baifeng Shi and Hongxu Yin and Boris Ivanovic and Song Han and Trevor Darrell and Jitendra Malik and Marco Pavone and Boyi Li},
year={2025},
eprint={2512.10927},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.10927},
}
