Skip to content

longxiang-ai/awesome-video-diffusions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

70 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Video Diffusions Awesome

A curated list of latest research papers, projects and resources related to Video Diffusion Models and Video Generation. Content is automatically updated daily.

Last Update: 2026-04-27 02:49:17

πŸ“° Latest Updates

πŸš€ [2026-02] Project Launched β€” v1.0

  • Adapted from awesome-gaussians framework for tracking video diffusion research

  • Unified CLI: Single entry point python main.py with subcommands: init, search, suggest, export-bib, readme

  • Interactive Configuration Wizard: Run python main.py init to set up keywords, domains, time range, and API keys step-by-step

  • Custom Time Range Filtering: Support relative periods (6m, 1y, 2y) and absolute date ranges

  • Smart Link Extraction: Automatically extracts and classifies GitHub, project page, dataset, video, demo, and HuggingFace links from paper abstracts

  • BibTeX Export: Fetch BibTeX from arXiv and export to .bib files with category/date filters

  • LLM Keyword Suggestion: Paste a few paper titles or arXiv IDs, and an LLM automatically generates optimized search keywords

  • arXiv Domain Filtering: Restrict searches to specific arXiv categories (e.g., cs.CV, cs.AI, cs.MM)

  • 16 Research Categories: Comprehensive taxonomy covering T2V, I2V, video editing, controllable generation, world models, and more

  • View detailed updates: News.md πŸ“‹


Categories

Table of Contents

Categorized Papers

3D-aware Video Generation

Applications

Architecture & Efficiency

Showing the latest 50 out of 350 papers

Audio & Multi-modal

  • HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos
    Authors: Xu Lu, Qianhong Peng, Qihao Zhou, Shaopeng Liu, Xiuqin Ye, Chuan Yang, Yuan Yuan
    Links: PDF
    Keywords: temporal consistency, medical, denoising, sound
  • DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
    Authors: Joonmyung Choi, Sanghyeok Lee, Jongha Kim, Sehyung Kim, Dohwan Ko, Jihyung Kil, Hyunwoo J. Kim
    Links: PDF
    Keywords: dit, efficient, multi-modal
  • MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation
    Authors: Liyang Li, Wen Wang, Canyu Zhao, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen
    Links: PDF
    Keywords: dit, controllable, video generation, multi-modal, layout, identity, diffusion transformer, video diffusion
  • Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data
    Authors: Gopal Krishna Shyam, Ila Chandrakar
    Links: PDF
    Keywords: architecture, dit, multi-modal
  • OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
    Authors: Lei Zhu, Xing Cai, Yingjie Chen, Yiheng Li, Binxin Yang, Hao Liu, Jie Chen, Chen Li, Jing LYu
    Links: PDF
    Keywords: video synthesis, evaluation, video generation, multi-modal, benchmark, physical
  • TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation
    Authors: Xiangyu Liu, Feng Gao, Xiaomei Zhang, Yong Zhang, Xiaoming Wei, Zhen Lei, Xiangyu Zhu
    Links: PDF
    Keywords: avatar, audio-driven, distillation, denoising, diffusion model, video diffusion
  • Seedance 2.0: Advancing Video Generation for World Complexity
    Authors: Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, Mojie Chi, Xuyan Chi, Jian Cong, Qinpeng Cui, Fei Ding, Qide Dong, Yujiao Du, Haojie Duanmu, Junliang Fan, Jiarui Fang, Jing Fang, Zetao Fang, Chengjian Feng, Yu Gao, Diandian Gu, Dong Guo, Hanzhong Guo, Qiushan Guo, Boyang Hao, Hongxiang Hao, Haoxun He, Jiaao He, Qian He, Tuyen Hoang, Heng Hu, Ruoqing Hu, Yuxiang Hu, Jiancheng Huang, Weilin Huang, Zhaoyang Huang, Zhongyi Huang, Jishuo Jin, Ming Jing, Ashley Kim, Shanshan Lao, Yichong Leng, Bingchuan Li, Gen Li, Haifeng Li, Huixia Li, Jiashi Li, Ming Li, Xiaojie Li, Xingxing Li, Yameng Li, Yiying Li, Yu Li, Yueyan Li, Chao Liang, Han Liang, Jianzhong Liang, Ying Liang, Wang Liao, J. H. Lien, Shanchuan Lin, Xi Lin, Feng Ling, Yue Ling, Fangfang Liu, Jiawei Liu, Jihao Liu, Jingtuo Liu, Shu Liu, Sichao Liu, Wei Liu, Xue Liu, Zuxi Liu, Ruijie Lu, Lecheng Lyu, Jingting Ma, Tianxiang Ma, Xiaonan Nie, Jingzhe Ning, Junjie Pan, Xitong Pan, Ronggui Peng, Xueqiong Qu, Yuxi Ren, Yuchen Shen, Guang Shi, Lei Shi, Yinglong Song, Fan Sun, Li Sun, Renfei Sun, Wenjing Tang, Boyang Tao, Zirui Tao, Dongliang Wang, Feng Wang, Hulin Wang, Ke Wang, Qingyi Wang, Rui Wang, Shuai Wang, Shulei Wang, Weichen Wang, Xuanda Wang, Yanhui Wang, Yue Wang, Yuping Wang, Yuxuan Wang, Zijie Wang, Ziyu Wang, Guoqiang Wei, Meng Wei, Di Wu, Guohong Wu, Hanjie Wu, Huachao Wu, Jian Wu, Jie Wu, Ruolan Wu, Shaojin Wu, Xiaohu Wu, Xinglong Wu, Yonghui Wu, Ruiqi Xia, Xin Xia, Xuefeng Xiao, Shuang Xu, Bangbang Yang, Jiaqi Yang, Runkai Yang, Tao Yang, Yihang Yang, Zhixian Yang, Ziyan Yang, Fulong Ye, Bingqian Yi, Xing Yin, Yongbin You, Linxiao Yuan, Weihong Zeng, Xuejiao Zeng, Yan Zeng, Siyu Zhai, Zhonghua Zhai, Bowen Zhang, Chenlin Zhang, Heng Zhang, Jun Zhang, Manlin Zhang, Peiyuan Zhang, Shuo Zhang, Xiaohe Zhang, Xiaoying Zhang, Xinyan Zhang, Xinyi Zhang, Yichi Zhang, Zixiang Zhang, Haiyu Zhao, Huating Zhao, Liming Zhao, Yian Zhao, Guangcong Zheng, Jianbin Zheng, Xiaozheng Zheng, Zerong Zheng, Kuan Zhu, Feilong Zuo
    Links: PDF
    Keywords: dit, efficient, architecture, evaluation, video generation, multi-modal, creative
  • HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
    Authors: Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
    Links: PDF
    Keywords: efficient, interactive, world model, architecture, multi-modal, trajectory, benchmark
  • Deepfakes at Face Value: Image and Authority
    Authors: James Ravi Kirkpatrick
    Links: PDF
    Keywords: simulation, identity, sound
  • Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
    Authors: Yuzhe Weng, Haotian Wang, Xinyi Yu, Xiaoyan Wu, Haoran Xu, Shan He, Jun Du
    Links: PDF | Project
    Keywords: avatar, audio-driven, interactive, dynamics, video generation, physical

Controllable Generation

Showing the latest 50 out of 136 papers

Human & Character Animation

Image-to-Video Generation

Long Video Generation

Showing the latest 50 out of 112 papers

Personalization & Customization

Showing the latest 50 out of 85 papers

Physical Understanding

Showing the latest 50 out of 148 papers

Surveys & Benchmarks

Showing the latest 50 out of 202 papers

Text-to-Video Generation

Showing the latest 50 out of 61 papers

Video Editing

Video Inpainting & Completion

Video Super-Resolution & Enhancement

Showing the latest 50 out of 61 papers

World Models & Simulation

Showing the latest 50 out of 97 papers

  • Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
    Authors: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
    Links: PDF
    Keywords: dit, world model, evaluation, dynamics, simulation, video generation, action-conditioned, physical
  • WorldMark: A Unified Benchmark Suite for Interactive Video World Models
    Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
    Links: PDF
    Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark
  • X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
    Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
    Links: PDF
    Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion
  • CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
    Authors: Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler
    Links: PDF
    Keywords: dit, t2v, autonomous driving, simulation, video generation, i2v, robotics, physical
  • UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
    Authors: Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge
    Links: PDF
    Keywords: dit, world model, dynamics, simulation, video generation, benchmark, physical
  • How Far Are Video Models from True Multimodal Reasoning?
    Authors: Xiaotian Zhang, Jianhui Wei, Yuan Wang, Jie Tan, Yichen Li, Yan Zhang, Ziyi Chen, Daoan Zhang, Dezhi YU, Wei Xu, Songtao Jiang, Zuozhu Liu
    Links: PDF
    Keywords: interactive, evaluation, simulation, video generation, physical simulation, benchmark, physical
  • RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
    Authors: Feng Jiang, Yang Chen, Kyle Xu, Yuchen Liu, Haifeng Wang, Zhenhao Shen, Jasper Lu, Shengze Huang, Yuanfei Wang, Chen Xie, Ruihai Wu
    Links: PDF
    Keywords: world model, evaluation, dynamics, video generation, benchmark, physical
  • MultiWorld: Scalable Multi-Agent Multi-View Video World Models
    Authors: Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu
    Links: PDF | Project
    Keywords: dit, multi-view video, world model, dynamics, video generation, action-conditioned
  • WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
    Authors: Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu
    Links: PDF
    Keywords: benchmark, interactive, evaluation, dit
  • HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
    Authors: Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
    Links: PDF
    Keywords: efficient, interactive, world model, architecture, multi-modal, trajectory, benchmark

Classic Papers

Open Source Projects

Tutorials & Blogs

πŸ“‹ Project Features

πŸ› οΈ Core Features

  • Unified CLI (main.py): Single entry point with init, search, suggest, export-bib, readme subcommands
  • Interactive Config Wizard: Guided setup for keywords, domains, time range, and API keys via python main.py init
  • Custom Search Keywords: Configure keywords for title, abstract, or both; with arXiv domain filtering (cs.CV, cs.AI, cs.MM, etc.)
  • Time Range Filtering: Relative periods (30d, 6m, 1y, 2y) or absolute date ranges (YYYY-MM-DD to YYYY-MM-DD)
  • Smart Link Extraction: Auto-classifies URLs from abstracts into GitHub, project page, dataset, video, demo, HuggingFace links
  • BibTeX Export: Fetch BibTeX from arXiv official API; export to .bib files with category and date filters
  • LLM Keyword Suggestion: Input paper titles or arXiv IDs to auto-generate optimized search keywords via OpenAI-compatible API
  • Automated Paper Collection: Daily automatic crawling with GitHub Actions
  • Intelligent Classification: Auto-categorize papers into 16 topics (T2V, I2V, Video Editing, Controllable Generation, World Models, etc.)

πŸ› οΈ Technical Features

  • Robust Error Handling: Multi-layer retry and fallback strategies ensure stable operation
  • GitHub Actions Integration: Automated CI/CD workflows for daily updates
  • Multi-type Link Badges: README entries display PDF, GitHub (with stars), Project, Dataset, Video, Demo, HuggingFace, and Citation badges
  • Detailed Logging: Comprehensive logging for debugging and monitoring
  • Cross-Platform: Support for Windows/Linux/macOS

πŸ“š Data Output

  • Paper JSON files (data/papers_YYYY-MM-DD.json): Full paper metadata with title, authors, abstract, links, keywords, BibTeX
  • BibTeX files (output/*.bib): Ready-to-use bibliography files for LaTeX
  • Auto-generated README: Categorized and formatted paper listings

πŸš€ Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Interactive Setup (Recommended)

python main.py init

This wizard walks you through:

  • Setting search keywords (for title, abstract, or both)
  • Selecting arXiv domains (e.g., cs.CV, cs.AI, cs.MM)
  • Configuring time range (relative like 6m/1y, or absolute dates)
  • Setting max results
  • Optionally configuring an OpenAI-compatible API key for keyword suggestion

3. Search Papers

# Search with settings from user_config.json
python main.py search

# Override: fetch 200 papers from the last 6 months, include BibTeX
python main.py search --max-results 200 --recent 6m --bibtex

# Search with absolute date range
python main.py search --date-from 2024-01-01 --date-to 2025-01-01

# Include citation counts from Semantic Scholar
python main.py search --citations

4. Export BibTeX

# Export all papers from the latest data file
python main.py export-bib --output output/references.bib

# Export only "Text-to-Video Generation" papers
python main.py export-bib --category "Text-to-Video Generation" --output output/t2v.bib

# Export papers from a specific date range
python main.py export-bib --date-from 2024-06-01 --date-to 2025-01-01 --output output/recent.bib

5. LLM Keyword Suggestion

# Generate keywords from paper titles
python main.py suggest --titles "Video Diffusion Models" "Stable Video Diffusion"

# Generate from arXiv IDs (auto-fetches titles)
python main.py suggest --arxiv-ids 2204.03458 2311.15127

# Auto-write suggested keywords to config
python main.py suggest --titles "Sora" "CogVideoX" --apply

# Use a custom API endpoint (e.g., DeepSeek)
python main.py suggest --titles "Paper Title" --base-url https://api.deepseek.com/v1 --api-key sk-xxx --model deepseek-chat

6. Generate README

# Basic README
python main.py readme

# Include latest papers section and abstracts
python main.py readme --show-latest --show-abstracts

Configuration File

All settings are stored in data/user_config.json:

{
  "search": {
    "keywords": {
      "both_abstract_and_title": ["video diffusion", "video generation", "text-to-video"],
      "abstract_only": ["diffusion model video generation"],
      "title_only": ["video generation", "video diffusion"]
    },
    "domains": ["cs.CV", "cs.AI", "cs.MM"],
    "time_range": {
      "mode": "relative",
      "relative": "1y"
    },
    "max_results": 500
  },
  "api_keys": {
    "openai_api_key": "",
    "openai_base_url": "https://api.openai.com/v1",
    "openai_model": "gpt-4o-mini"
  }
}

Contribution Guidelines

Feel free to submit Pull Requests to improve this list! Please follow these formats:

  • Paper entry format: **[Paper Title](link)** - Brief description
  • Project entry format: [Project Name](link) - Project description

License

CC0

About

A curated and auto-updated collection of video diffusion / video generation papers from arXiv, covering text-to-video, image-to-video, controllable generation, world models, video editing, and 16+ research categories.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages