SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
Zewei Zhou1,2*, Ruining Yang2,3*, Xuewei (Tony) Qi2†, Yiluan Guo2, Sherry X. Chen2, Tao Feng2, Kateryna Pistunova2, Yishan Shen2, Lili Su3, Jiaqi Ma1
1 University of California, Los Angeles, USA | 2 Motional, USA | 3 Northeastern University, USA
* Equal contribution. † Corresponding author.
SpanVLA introduce a efficient action bridging with sparse KV-Cache and history initialization and learn from negative-recovery samples to improve the robustness and performance.
2026/04: SpanVLA paper is now released.
2026/04: ✅ SpanVLA paper.2026/09: SpanVLA codebase.2026/12: mReasoning dataset.
We would like to express their gratitude to Qian Zhu, Haram Kim, and Baoshu Qi for their extensive efforts in data preparation and annotation of mReasoning dataset. Special thanks also go to Muhammad Taufik Tirtosudiro and Jiong Yang for their support in developing the evaluation pipeline. The authors also thank Nitin Kapania, Sourabh Vora, and Balajee Kannan for their strong support for the project.
If you find this repository useful for your research, please consider giving us a star 🌟 and citing our paper.
@article{zhou2026spanvla,
author = {Zhou, Zewei and Yang, Ruining and Qi, Xuewei and Guo, Yiluan and Chen, Sherry X. and Feng, Tao and Pistunova, Kateryna and Shen, Yishan and Su, Lili and Ma, Jiaqi},
title = {SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model},
journal = {arXiv preprint arXiv:2604.19710},
year = {2026},
}