Any2Caption/README.md at main · ChocoWu/Any2Caption

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

Project page: https://sqwu.top/Any2Cap/

TL;DR

We present Any2Caption, a novel framework for controllable video generation from any condition. The key idea is decoupling various condition interpretation steps from the video synthesis step. By leveraging modern multimodal large language models (MLLMs), Any2Caption interprets diverse inputs—text, images, videos, and specialized cues such as region, motion, and camera poses—into dense, structured captions that offer backbone video generators with better guidance.

Code

Stay Tuned.

Citation

If you find Any2Caotion is useful and use it in your project, please kindly cite:

@inproceedings{wu2025Any2Caption,
    title={Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation},
    author={Shengqiong Wu and Weicai Ye and Jiahao Wang and Quande Liu and Xintao Wang and Pengfei Wan and Di Zhang and Kun Gai and Shuicheng Yan and Hao Fei and Tat-Seng Chua},
    booktitle={arxiv},
    year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

Project page: https://sqwu.top/Any2Cap/

TL;DR

Code

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

Project page: https://sqwu.top/Any2Cap/

TL;DR

Code

Citation