Name	Name	Last commit message	Last commit date
parent directory ..
assets	assets
eval	eval
train	train
README.md	README.md

Name

Last commit message

Last commit date

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification

We introduce Video-XL-2, a new suite of multimodal models that achieves state-of-the-art (SOTA) performance and superior efficiency in long video understanding.

Video-XL-2: SOTA Performance and Unrivaled Efficiency

Video-XL-2 achieves SOTA performance in mainstream long video understanding benchmarks and leading performance in temporal grounding tasks when compared to open-source lightweight models. Furthermore, it boasts significant advantages over existing models in both memory consumption and inference speed."

Model Weights

Model name	HF Weight
Video-XL-2/Stage1	🤗 HF link
Video-XL-2/Stage2	🤗 HF link
Video-XL-2/Stage3	🤗 HF link
Video-XL-2/Stage4	🤗 HF link

Setup

Clone this repository and install required packages:

git clone https://github.com/VectorSpaceLab/Video-XL
cd Video-XL-2
pip install -r requirements.txt

Training

The training codes and scripts can be found in ./train.

Evaluation

The evaluation codes and scripts can be found in ./eval.

Acknowledgement

We thank the great work from Video-XL Series, LongVA, lmms-eval, Qwen,VideoChat-Flash.

Citation

If you find Video-XL-2 useful for your research and applications, please consider starring this repository and citing:

@article{qin2025video,
  title={Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification},
  author={Qin, Minghao and Liu, Xiangrui and Liang, Zhengyang and Shu, Yan and Yuan, Huaying and Zhou, Juenjie and Xiao, Shitao and Zhao, Bo and Liu, Zheng},
  journal={arXiv preprint arXiv:2506.19225},
  year={2025}
}

@article{shu2024video,
  title={Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding},
  author={Shu, Yan and Zhang, Peitian and Liu, Zheng and Qin, Minghao and Zhou, Junjie and Huang, Tiejun and Zhao, Bo},
  journal={arXiv preprint arXiv:2409.14485},
  year={2024}
}

@article{liu2025video,
  title={Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding},
  author={Liu, Xiangrui and Shu, Yan and Liu, Zheng and Li, Ao and Tian, Yang and Zhao, Bo},
  journal={arXiv preprint arXiv:2503.18478},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification

Video-XL-2: SOTA Performance and Unrivaled Efficiency

Model Weights

Setup

Training

Evaluation

Acknowledgement

Citation

FilesExpand file tree

Video-XL-2

Directory actions

More options

Directory actions

More options

Latest commit

History

Video-XL-2

Folders and files

parent directory

README.md

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification

Video-XL-2: SOTA Performance and Unrivaled Efficiency

Model Weights

Setup

Training

Evaluation

Acknowledgement

Citation