A curated list of latest research papers, projects and resources related to Video Diffusion Models and Video Generation. Content is automatically updated daily.
Last Update: 2026-04-27 02:49:17
π [2026-02] Project Launched β v1.0
-
Adapted from awesome-gaussians framework for tracking video diffusion research
-
Unified CLI: Single entry point
python main.pywith subcommands:init,search,suggest,export-bib,readme -
Interactive Configuration Wizard: Run
python main.py initto set up keywords, domains, time range, and API keys step-by-step -
Custom Time Range Filtering: Support relative periods (
6m,1y,2y) and absolute date ranges -
Smart Link Extraction: Automatically extracts and classifies GitHub, project page, dataset, video, demo, and HuggingFace links from paper abstracts
-
BibTeX Export: Fetch BibTeX from arXiv and export to
.bibfiles with category/date filters -
LLM Keyword Suggestion: Paste a few paper titles or arXiv IDs, and an LLM automatically generates optimized search keywords
-
arXiv Domain Filtering: Restrict searches to specific arXiv categories (e.g.,
cs.CV,cs.AI,cs.MM) -
16 Research Categories: Comprehensive taxonomy covering T2V, I2V, video editing, controllable generation, world models, and more
-
View detailed updates: News.md π
- 3D-aware Video Generation (29 papers) - Video generation with 3D awareness, multi-view consistency, and 4D content creation
- Applications (46 papers) - Domain-specific applications of video diffusion models
- Architecture & Efficiency (350 papers) - Architectural innovations (DiT, UNet), flow matching, and training/inference efficiency
- Audio & Multi-modal (32 papers) - Audio-driven and multi-modal conditioned video generation
- Controllable Generation (136 papers) - Controllable video generation with motion, camera, pose, or layout guidance
- Human & Character Animation (25 papers) - Human-centric video generation including talking heads, dance, and character animation
- Image-to-Video Generation (41 papers) - Methods for animating still images into videos
- Long Video Generation (112 papers) - Generating temporally consistent long-form videos beyond short clips
- Personalization & Customization (85 papers) - Personalized video generation with custom subjects, identities, or styles
- Physical Understanding (148 papers) - Physics-aware video generation and dynamics modeling
- Surveys & Benchmarks (202 papers) - Survey papers, benchmarks, and evaluation metrics for video generation
- Text-to-Video Generation (61 papers) - Foundation models and methods for generating videos from text prompts
- Video Editing (29 papers) - Diffusion-based video editing, style transfer, and manipulation
- Video Inpainting & Completion (6 papers) - Video inpainting, completion, outpainting, and temporal prediction
- Video Super-Resolution & Enhancement (61 papers) - Video quality improvement, upscaling, restoration, and frame interpolation
- World Models & Simulation (97 papers) - Video generation as world simulators and interactive environment generation
- AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
Authors: Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue
Links:
Keywords: dit, distillation, diffusion model, video diffusion, novel view - MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Authors: Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu
Links:|
Keywords: dit, multi-view video, world model, dynamics, video generation, action-conditioned - ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models
Authors: Xinliang Wang, Yifeng Shi, Zhenyu Wu
Links:
Keywords: video generation, video diffusion, novel view - Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
Authors: Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang
Links:
Keywords: dit, video generation, diffusion model, trajectory, video diffusion, novel view - Novel View Synthesis as Video Completion
Authors: Qi Wu, Khiem Vuong, Minsik Jeon, Srinivasa Narasimhan, Deva Ramanan
Links:|
Keywords: video completion, diffusion model, benchmark, video diffusion, novel view - Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model
Authors: Peiyan Li, Yixiang Chen, Yuan Xu, Jiabing Yang, Xiangnan Wu, Jun Guo, Nan Sun, Long Qian, Xinghang Li, Xin Xiao, Jing Liu, Nianfeng Liu, Tao Kong, Yan Huang, Liang Wang, Tieniu Tan
Links:
Keywords: dit, multi-view video, efficient, dynamics, video diffusion - Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion
Authors: Edoardo A. Dominici, Thomas Deixelberger, Konstantinos Vardis, Markus Steinberger
Links:
Keywords: image-to-video, dit, style, architecture, simulation, diffusion model, controllable, video diffusion, novel view - HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis
Authors: Mingjin Chen, Junhao Chen, Zhaoxin Fan, Yujian Lee, Zichen Dang, Lili Wang, Yawen Cui, Lap-Pui Chau, Yi Wang
Links:
Keywords: video synthesis, dit, architecture, simulation, video generation, 3d-aware - I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation
Authors: Jia Li, Han Yan, Yihang Chen, Siqi Li, Xibin Song, Yifu Wang, Jianfei Cai, Tien-Tsin Wong, Pan Ji
Links:
Keywords: camera control, dit, video generation, 3d-aware, novel view - GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models
Authors: Zekai Gu, Shuoxuan Feng, Yansong Wang, Hanzhuo Huang, Zhongshuo Du, Chengfeng Zhao, Chengwei Ren, Peng Wang, Yuan Liu
Links:
Keywords: dit, efficient, diffusion model, 3d-aware, controllable, video diffusion
- HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos
Authors: Xu Lu, Qianhong Peng, Qihao Zhou, Shaopeng Liu, Xiuqin Ye, Chuan Yang, Yuan Yuan
Links:
Keywords: temporal consistency, medical, denoising, sound - Seeing Fast and Slow: Learning the Flow of Time in Videos
Authors: Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
Links:
Keywords: dit, concept, film, video generation, controllable, super-resolution - KD-CVG: A Knowledge-Driven Approach for Creative Video Generation
Authors: Linkai Liu, Wei Feng, Xi Zhao, Shen Zhang, Xingye Chen, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Yuchen Zhou, Zipeng Guo, Chao Gou
Links:|
Keywords: t2v, text-to-video, creative, video generation, advertising - Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction
Authors: Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das
Links:
Keywords: video synthesis, avatar, architecture, evaluation, video generation, medical - AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
Authors: Adam Cole, Mick Grierson
Links:
Keywords: dit, style, video generation, creative, diffusion transformer, video diffusion - Building a Precise Video Language with Human-AI Oversight
Authors: Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan
Links:|
Keywords: dit, dynamics, video generation, benchmark, film - X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
Links:
Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion - CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
Authors: Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler
Links:
Keywords: dit, t2v, autonomous driving, simulation, video generation, i2v, robotics, physical - CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Authors: Xiangyang Luo, Xiaozhe Xin, Tao Feng, Xu Guo, Meiguang Jin, Junfeng Ma
Links:
Keywords: video synthesis, dit, diffusion model, diffusion transformer, advertising, physical - AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos
Authors: Jiagao Hu, Daiguo Zhou, Danzhen Fu, Fuhao Li, Zepeng Wang, Fei Wang, Wenhua Liao, Jiayi Xie, Haiyang Sun
Links:|
Keywords: dit, style, autonomous driving, temporal consistency, video generation, controllable
Showing the latest 50 out of 350 papers
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Authors: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
Links:
Keywords: dit, world model, evaluation, dynamics, simulation, video generation, action-conditioned, physical - DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
Authors: Joonmyung Choi, Sanghyeok Lee, Jongha Kim, Sehyung Kim, Dohwan Ko, Jihyung Kil, Hyunwoo J. Kim
Links:
Keywords: dit, efficient, multi-modal - Seeing Fast and Slow: Learning the Flow of Time in Videos
Authors: Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
Links:
Keywords: dit, concept, film, video generation, controllable, super-resolution - WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
Links:
Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark - Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
Authors: Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang
Links:
Keywords: video synthesis, efficient, temporal consistency, video generation, identity, controllable - Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation
Authors: Boxun Xu, Yuming Du, Zichang Liu, Siyu Yang, Ziyang Jiang, Siqi Yan, Rajasi Saha, Albert Pumarola, Wenchen Wang, Peng Li
Links:
Keywords: efficient, text-to-video, autoregressive, video generation, diffusion model, video diffusion - Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction
Authors: Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das
Links:
Keywords: video synthesis, avatar, architecture, evaluation, video generation, medical - DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
Authors: Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo
Links:
Keywords: dit, physical, physics - AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
Authors: Adam Cole, Mick Grierson
Links:
Keywords: dit, style, video generation, creative, diffusion transformer, video diffusion - Building a Precise Video Language with Human-AI Oversight
Authors: Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan
Links:|
Keywords: dit, dynamics, video generation, benchmark, film
- HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos
Authors: Xu Lu, Qianhong Peng, Qihao Zhou, Shaopeng Liu, Xiuqin Ye, Chuan Yang, Yuan Yuan
Links:
Keywords: temporal consistency, medical, denoising, sound - DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
Authors: Joonmyung Choi, Sanghyeok Lee, Jongha Kim, Sehyung Kim, Dohwan Ko, Jihyung Kil, Hyunwoo J. Kim
Links:
Keywords: dit, efficient, multi-modal - MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation
Authors: Liyang Li, Wen Wang, Canyu Zhao, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen
Links:
Keywords: dit, controllable, video generation, multi-modal, layout, identity, diffusion transformer, video diffusion - Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data
Authors: Gopal Krishna Shyam, Ila Chandrakar
Links:
Keywords: architecture, dit, multi-modal - OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
Authors: Lei Zhu, Xing Cai, Yingjie Chen, Yiheng Li, Binxin Yang, Hao Liu, Jie Chen, Chen Li, Jing LYu
Links:
Keywords: video synthesis, evaluation, video generation, multi-modal, benchmark, physical - TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation
Authors: Xiangyu Liu, Feng Gao, Xiaomei Zhang, Yong Zhang, Xiaoming Wei, Zhen Lei, Xiangyu Zhu
Links:
Keywords: avatar, audio-driven, distillation, denoising, diffusion model, video diffusion - Seedance 2.0: Advancing Video Generation for World Complexity
Authors: Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, Mojie Chi, Xuyan Chi, Jian Cong, Qinpeng Cui, Fei Ding, Qide Dong, Yujiao Du, Haojie Duanmu, Junliang Fan, Jiarui Fang, Jing Fang, Zetao Fang, Chengjian Feng, Yu Gao, Diandian Gu, Dong Guo, Hanzhong Guo, Qiushan Guo, Boyang Hao, Hongxiang Hao, Haoxun He, Jiaao He, Qian He, Tuyen Hoang, Heng Hu, Ruoqing Hu, Yuxiang Hu, Jiancheng Huang, Weilin Huang, Zhaoyang Huang, Zhongyi Huang, Jishuo Jin, Ming Jing, Ashley Kim, Shanshan Lao, Yichong Leng, Bingchuan Li, Gen Li, Haifeng Li, Huixia Li, Jiashi Li, Ming Li, Xiaojie Li, Xingxing Li, Yameng Li, Yiying Li, Yu Li, Yueyan Li, Chao Liang, Han Liang, Jianzhong Liang, Ying Liang, Wang Liao, J. H. Lien, Shanchuan Lin, Xi Lin, Feng Ling, Yue Ling, Fangfang Liu, Jiawei Liu, Jihao Liu, Jingtuo Liu, Shu Liu, Sichao Liu, Wei Liu, Xue Liu, Zuxi Liu, Ruijie Lu, Lecheng Lyu, Jingting Ma, Tianxiang Ma, Xiaonan Nie, Jingzhe Ning, Junjie Pan, Xitong Pan, Ronggui Peng, Xueqiong Qu, Yuxi Ren, Yuchen Shen, Guang Shi, Lei Shi, Yinglong Song, Fan Sun, Li Sun, Renfei Sun, Wenjing Tang, Boyang Tao, Zirui Tao, Dongliang Wang, Feng Wang, Hulin Wang, Ke Wang, Qingyi Wang, Rui Wang, Shuai Wang, Shulei Wang, Weichen Wang, Xuanda Wang, Yanhui Wang, Yue Wang, Yuping Wang, Yuxuan Wang, Zijie Wang, Ziyu Wang, Guoqiang Wei, Meng Wei, Di Wu, Guohong Wu, Hanjie Wu, Huachao Wu, Jian Wu, Jie Wu, Ruolan Wu, Shaojin Wu, Xiaohu Wu, Xinglong Wu, Yonghui Wu, Ruiqi Xia, Xin Xia, Xuefeng Xiao, Shuang Xu, Bangbang Yang, Jiaqi Yang, Runkai Yang, Tao Yang, Yihang Yang, Zhixian Yang, Ziyan Yang, Fulong Ye, Bingqian Yi, Xing Yin, Yongbin You, Linxiao Yuan, Weihong Zeng, Xuejiao Zeng, Yan Zeng, Siyu Zhai, Zhonghua Zhai, Bowen Zhang, Chenlin Zhang, Heng Zhang, Jun Zhang, Manlin Zhang, Peiyuan Zhang, Shuo Zhang, Xiaohe Zhang, Xiaoying Zhang, Xinyan Zhang, Xinyi Zhang, Yichi Zhang, Zixiang Zhang, Haiyu Zhao, Huating Zhao, Liming Zhao, Yian Zhao, Guangcong Zheng, Jianbin Zheng, Xiaozheng Zheng, Zerong Zheng, Kuan Zhu, Feilong Zuo
Links:
Keywords: dit, efficient, architecture, evaluation, video generation, multi-modal, creative - HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
Authors: Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
Links:
Keywords: efficient, interactive, world model, architecture, multi-modal, trajectory, benchmark - Deepfakes at Face Value: Image and Authority
Authors: James Ravi Kirkpatrick
Links:
Keywords: simulation, identity, sound - Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
Authors: Yuzhe Weng, Haotian Wang, Xinyi Yu, Xiaoyan Wu, Haoran Xu, Shan He, Jun Du
Links:|
Keywords: avatar, audio-driven, interactive, dynamics, video generation, physical
Showing the latest 50 out of 136 papers
- Seeing Fast and Slow: Learning the Flow of Time in Videos
Authors: Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
Links:
Keywords: dit, concept, film, video generation, controllable, super-resolution - WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
Links:
Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark - Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
Authors: Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang
Links:
Keywords: video synthesis, efficient, temporal consistency, video generation, identity, controllable - X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
Links:
Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion - ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
Authors: Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han
Links:|
Keywords: video synthesis, temporal consistency, video generation, diffusion model, controllable, video diffusion - MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation
Authors: Liyang Li, Wen Wang, Canyu Zhao, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen
Links:
Keywords: dit, controllable, video generation, multi-modal, layout, identity, diffusion transformer, video diffusion - Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
Authors: Rui Li, Ke Hao, Yuanzhi Liang, Haibin Huang, Chi Zhang, YunGu, XueLong Li
Links:
Keywords: dit, evaluation, denoising, video generation, trajectory - AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos
Authors: Jiagao Hu, Daiguo Zhou, Danzhen Fu, Fuhao Li, Zepeng Wang, Fei Wang, Wenhua Liao, Jiayi Xie, Haiyang Sun
Links:|
Keywords: dit, style, autonomous driving, temporal consistency, video generation, controllable - ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes
Authors: Honglin Chen, Karran Pandey, Rundi Wu, Matheus Gadelha, Yannick Hold-Geoffroy, Ayush Tewari, Niloy J. Mitra, Changxi Zheng, Paul Guerrero
Links:
Keywords: dit, evaluation, diffusion model, controllable, video diffusion, physical - DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior
Authors: Junjia Huang, Binbin Yang, Pengxiang Yan, Jiyang Liu, Bin Xia, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li
Links:
Keywords: dit, temporal consistency, identity, diffusion model, controllable, video diffusion
- Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction
Authors: Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das
Links:
Keywords: video synthesis, avatar, architecture, evaluation, video generation, medical - HumanScore: Benchmarking Human Motions in Generated Videos
Authors: Yusu Fang, Tiange Xiang, Tian Tan, Narayan Schuetz, Scott Delp, Li Fei-Fei, Ehsan Adeli
Links:
Keywords: architecture, human motion, dynamics, video generation, benchmark, physical - Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation
Authors: Hassan Ali, Doreen Jirak, Luca MΓΌller, Stefan Wermter
Links:
Keywords: image-to-video, gesture, video generation, dit - TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation
Authors: Xiangyu Liu, Feng Gao, Xiaomei Zhang, Yong Zhang, Xiaoming Wei, Zhen Lei, Xiangyu Zhu
Links:
Keywords: avatar, audio-driven, distillation, denoising, diffusion model, video diffusion - Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
Authors: Yuzhe Weng, Haotian Wang, Xinyi Yu, Xiaoyan Wu, Haoran Xu, Shan He, Jun Du
Links:|
Keywords: avatar, audio-driven, interactive, dynamics, video generation, physical - HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation
Authors: Tao Hu, Varun Jampani
Links:|
Keywords: video synthesis, image-to-video, dit, physics, style, architecture, dynamics, motion control, video generation, diffusion model, human motion, controllable, video diffusion, physical - Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
Authors: Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo
Links:
Keywords: human animation, virtual try-on, architecture, image animation, identity, diffusion transformer, video diffusion - DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data
Authors: Wonjoon Jin, Jiyun Won, Janghyeok Han, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho
Links:
Keywords: video synthesis, dit, motion control, diffusion model, human motion, video diffusion - FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation
Authors: Liuzhou Zhang, Zeyu Zhang, Biao Wu, Luyao Tang, Zirui Song, Hongyang He, Renda Han, Guangzhen Yao, Huacan Wang, Ronghao Chen, Xiuying Chen, Guan Huang, Zheng Zhu
Links:|
Keywords: gesture, video generation, efficient - LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model
Authors: Quankai Gao, Jiawei Yang, Qiangeng Xu, Le Chen, Yue Wang
Links:
Keywords: dit, t2v, text-to-video, physics, gesture, world model, temporal consistency, motion control, action-conditioned, physical
- WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
Links:
Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark - CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
Authors: Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler
Links:
Keywords: dit, t2v, autonomous driving, simulation, video generation, i2v, robotics, physical - TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
Authors: Hongyu Zhang, Yufan Deng, Zilin Pan, Peng-Tao Jiang, Bo Li, Qibin Hou, Zhiyang Dou, Zhen Dong, Daquan Zhou
Links:|
Keywords: image-to-video, dit, t2v, text-to-video, temporal consistency, video generation - AnimationBench: Are Video Models Good at Character-Centric Animation?
Authors: Leyi Wu, Pengjun Fang, Kai Sun, Yazhou Xing, Yinwei Wu, Songsong Wang, Ziqi Huang, Dan Zhou, Yingqing He, Ying-Cong Chen, Qifeng Chen
Links:
Keywords: image-to-video, style, evaluation, video generation, i2v, benchmark - Flow of Truth: Proactive Temporal Forensics for Image-to-Video Generation
Authors: Yuzhuo Chen, Zehua Ma, Han Fang, Hengyi Wang, Guanjie Wang, Weiming Zhang
Links:
Keywords: image-to-video, dit, video generation, creative, i2v - Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation
Authors: Hassan Ali, Doreen Jirak, Luca MΓΌller, Stefan Wermter
Links:
Keywords: image-to-video, gesture, video generation, dit - ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception
Authors: Huanzhen Wang, Ziheng Zhou, Jiaqi Song, Li He, Yunshi Lan, Yan Wang, Wenqiang Zhang
Links:
Keywords: image-to-video, dit, video diffusion, dynamics - Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation
Authors: Zeqian Long, Ozgur Kara, Haotian Xue, Yongxin Chen, James M. Rehg
Links:
Keywords: image-to-video, dit, video generation, trajectory, i2v - Lighting-grounded Video Generation with Renderer-based Agent Reasoning
Authors: Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi
Links:
Keywords: video synthesis, image-to-video, dit, video-to-video, temporal consistency, video generation, diffusion model, trajectory, layout, controllable, video diffusion, film - HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation
Authors: Tao Hu, Varun Jampani
Links:|
Keywords: video synthesis, image-to-video, dit, physics, style, architecture, dynamics, motion control, video generation, diffusion model, human motion, controllable, video diffusion, physical
Showing the latest 50 out of 112 papers
- HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos
Authors: Xu Lu, Qianhong Peng, Qihao Zhou, Shaopeng Liu, Xiuqin Ye, Chuan Yang, Yuan Yuan
Links:
Keywords: temporal consistency, medical, denoising, sound - Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
Authors: Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang
Links:
Keywords: video synthesis, efficient, temporal consistency, video generation, identity, controllable - Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation
Authors: Boxun Xu, Yuming Du, Zichang Liu, Siyu Yang, Ziyang Jiang, Siqi Yan, Rajasi Saha, Albert Pumarola, Wenchen Wang, Peng Li
Links:
Keywords: efficient, text-to-video, autoregressive, video generation, diffusion model, video diffusion - DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion
Authors: Yongji Long, Shijun Liang, Jintao Li, Yun Li
Links:|
Keywords: dynamics, video diffusion, long video, physics - X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
Links:
Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion - ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
Authors: Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han
Links:|
Keywords: video synthesis, temporal consistency, video generation, diffusion model, controllable, video diffusion - TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
Authors: Hongyu Zhang, Yufan Deng, Zilin Pan, Peng-Tao Jiang, Bo Li, Qibin Hou, Zhiyang Dou, Zhen Dong, Daquan Zhou
Links:|
Keywords: image-to-video, dit, t2v, text-to-video, temporal consistency, video generation - AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos
Authors: Jiagao Hu, Daiguo Zhou, Danzhen Fu, Fuhao Li, Zepeng Wang, Fei Wang, Wenhua Liao, Jiayi Xie, Haiyang Sun
Links:|
Keywords: dit, style, autonomous driving, temporal consistency, video generation, controllable - Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation
Authors: Vaibhavi Lokegaonkar, Aryan Vijay Bhosale, Vishnu Raj, Gouthaman KV, Ramani Duraiswami, Lie Lu, Sreyan Ghosh, Dinesh Manocha
Links:
Keywords: benchmark, dit, diffusion transformer, autoregressive - Speculative Decoding for Autoregressive Video Generation
Authors: Yuezhou Hu, Jintao Zhang
Links:
Keywords: video synthesis, dit, streaming, distillation, autoregressive, denoising, video generation, acceleration, video diffusion
Showing the latest 50 out of 85 papers
- Seeing Fast and Slow: Learning the Flow of Time in Videos
Authors: Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
Links:
Keywords: dit, concept, film, video generation, controllable, super-resolution - WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
Links:
Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark - Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
Authors: Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang
Links:
Keywords: video synthesis, efficient, temporal consistency, video generation, identity, controllable - AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
Authors: Adam Cole, Mick Grierson
Links:
Keywords: dit, style, video generation, creative, diffusion transformer, video diffusion - MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation
Authors: Liyang Li, Wen Wang, Canyu Zhao, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen
Links:
Keywords: dit, controllable, video generation, multi-modal, layout, identity, diffusion transformer, video diffusion - AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos
Authors: Jiagao Hu, Daiguo Zhou, Danzhen Fu, Fuhao Li, Zepeng Wang, Fei Wang, Wenhua Liao, Jiayi Xie, Haiyang Sun
Links:|
Keywords: dit, style, autonomous driving, temporal consistency, video generation, controllable - DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior
Authors: Junjia Huang, Binbin Yang, Pengxiang Yan, Jiyang Liu, Bin Xia, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li
Links:
Keywords: dit, temporal consistency, identity, diffusion model, controllable, video diffusion - AnimationBench: Are Video Models Good at Character-Centric Animation?
Authors: Leyi Wu, Pengjun Fang, Kai Sun, Yazhou Xing, Yinwei Wu, Songsong Wang, Ziqi Huang, Dan Zhou, Yingqing He, Ying-Cong Chen, Qifeng Chen
Links:
Keywords: image-to-video, style, evaluation, video generation, i2v, benchmark - Implicit Neural Representations: A Signal Processing Perspective
Authors: Dhananjaya Jayasundara, Vishal M. Patel
Links:
Keywords: concept, medical - Controllable Video Object Insertion via Multiview Priors
Authors: Xia Qi, Peishan Cong, Yichen Yao, Ziyi Wang, Yaoqin Ye, Yuexin Ma
Links:
Keywords: dit, controllable, video generation, identity
Showing the latest 50 out of 148 papers
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Authors: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
Links:
Keywords: dit, world model, evaluation, dynamics, simulation, video generation, action-conditioned, physical - DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
Authors: Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo
Links:
Keywords: dit, physical, physics - DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion
Authors: Yongji Long, Shijun Liang, Jintao Li, Yun Li
Links:|
Keywords: dynamics, video diffusion, long video, physics - Building a Precise Video Language with Human-AI Oversight
Authors: Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan
Links:|
Keywords: dit, dynamics, video generation, benchmark, film - HumanScore: Benchmarking Human Motions in Generated Videos
Authors: Yusu Fang, Tiange Xiang, Tian Tan, Narayan Schuetz, Scott Delp, Li Fei-Fei, Ehsan Adeli
Links:
Keywords: architecture, human motion, dynamics, video generation, benchmark, physical - CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
Authors: Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler
Links:
Keywords: dit, t2v, autonomous driving, simulation, video generation, i2v, robotics, physical - UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Authors: Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge
Links:
Keywords: dit, world model, dynamics, simulation, video generation, benchmark, physical - CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Authors: Xiangyang Luo, Xiaozhe Xin, Tao Feng, Xu Guo, Meiguang Jin, Junfeng Ma
Links:
Keywords: video synthesis, dit, diffusion model, diffusion transformer, advertising, physical - How Far Are Video Models from True Multimodal Reasoning?
Authors: Xiaotian Zhang, Jianhui Wei, Yuan Wang, Jie Tan, Yichen Li, Yan Zhang, Ziyi Chen, Daoan Zhang, Dezhi YU, Wei Xu, Songtao Jiang, Zuozhu Liu
Links:
Keywords: interactive, evaluation, simulation, video generation, physical simulation, benchmark, physical - RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
Authors: Feng Jiang, Yang Chen, Kyle Xu, Yuchen Liu, Haifeng Wang, Zhenhao Shen, Jasper Lu, Shengze Huang, Yuanfei Wang, Chen Xie, Ruihai Wu
Links:
Keywords: world model, evaluation, dynamics, video generation, benchmark, physical
Showing the latest 50 out of 202 papers
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Authors: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
Links:
Keywords: dit, world model, evaluation, dynamics, simulation, video generation, action-conditioned, physical - Context Unrolling in Omni Models
Authors: Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan
Links:
Keywords: benchmark - WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
Links:
Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark - Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction
Authors: Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das
Links:
Keywords: video synthesis, avatar, architecture, evaluation, video generation, medical - Building a Precise Video Language with Human-AI Oversight
Authors: Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan
Links:|
Keywords: dit, dynamics, video generation, benchmark, film - SignDATA: Data Pipeline for Sign Language Translation
Authors: Kuanwei Chen, Tingyi Lin
Links:|
Keywords: evaluation, video generation - X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
Links:
Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion - HumanScore: Benchmarking Human Motions in Generated Videos
Authors: Yusu Fang, Tiange Xiang, Tian Tan, Narayan Schuetz, Scott Delp, Li Fei-Fei, Ehsan Adeli
Links:
Keywords: architecture, human motion, dynamics, video generation, benchmark, physical - UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Authors: Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge
Links:
Keywords: dit, world model, dynamics, simulation, video generation, benchmark, physical - Face Anything: 4D Face Reconstruction from Any Image Sequence
Authors: Umut Kocasari, Simon Giebenhain, Richard Shaw, Matthias NieΓner
Links:
Keywords: benchmark, architecture
Showing the latest 50 out of 61 papers
- KD-CVG: A Knowledge-Driven Approach for Creative Video Generation
Authors: Linkai Liu, Wei Feng, Xi Zhao, Shen Zhang, Xingye Chen, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Yuchen Zhou, Zipeng Guo, Chao Gou
Links:|
Keywords: t2v, text-to-video, creative, video generation, advertising - Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation
Authors: Boxun Xu, Yuming Du, Zichang Liu, Siyu Yang, Ziyang Jiang, Siqi Yan, Rajasi Saha, Albert Pumarola, Wenchen Wang, Peng Li
Links:
Keywords: efficient, text-to-video, autoregressive, video generation, diffusion model, video diffusion - CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
Authors: Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler
Links:
Keywords: dit, t2v, autonomous driving, simulation, video generation, i2v, robotics, physical - TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
Authors: Hongyu Zhang, Yufan Deng, Zilin Pan, Peng-Tao Jiang, Bo Li, Qibin Hou, Zhiyang Dou, Zhen Dong, Daquan Zhou
Links:|
Keywords: image-to-video, dit, t2v, text-to-video, temporal consistency, video generation - FlowC2S: Flowing from Current to Succeeding Frames for Fast and Memory-Efficient Video Continuation
Authors: Hovhannes Margaryan, Quentin Bammey, Christian Sandor
Links:
Keywords: evaluation, text-to-video, efficient - Generative Refinement Networks for Visual Synthesis
Authors: Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan
Links:
Keywords: dit, efficient, text-to-video, autoregressive, video generation, diffusion model, benchmark - VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
Authors: Andrei Atanov, Jesse Allardice, Roman Bachmann, OΔuzhan Fatih Kar, R Devon Hjelm, David Griffiths, Peter Fu, Afshin Dehghan, Amir Zamir
Links:
Keywords: video generation, text-to-video, efficient, long video - Motif-Video 2B: Technical Report
Authors: Junghwan Lim, Wai Ting Cheung, Minsu Ha, Beomgyu Kim, Taewhan Kim, Haesol Lee, Dongpin Oh, Jeesoo Lee, Taehyun Kim, Minjae Kim, Sungmin Lee, Hyeyeon Cho, Dahye Choi, Jaeheui Her, Jaeyeon Huh, Hanbin Jung, Changjin Kang, Dongseok Kim, Jangwoong Kim, Youngrok Kim, Hyukjin Kweon, Hongjoo Lee, Jeongdoo Lee, Junhyeok Lee, Eunhwan Park, Yeongjae Park, Bokki Ryu, Dongjoo Weon
Links:
Keywords: temporal consistency, video generation, text-to-video, efficient - When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
Authors: Zhengyang Sun, Yu Chen, Xin Zhou, Xiaofan Li, Xiwu Chen, Dingkang Liang, Xiang Bai
Links:|
Keywords: video synthesis, text-to-video, temporal consistency, diffusion model, layout, video diffusion - SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations
Authors: Yunnan Wang, Kecheng Zheng, Jianyuan Wang, Minghao Chen, David Novotny, Christian Rupprecht, Yinghao Xu, Xing Zhu, Wenjun Zeng, Xin Jin, Yujun Shen
Links:
Keywords: video synthesis, camera control, text-to-video, video generation, multi-modal, benchmark, controllable
- LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing
Authors: Weicheng Wang, Zhicheng Zhang, Zhongqi Zhang, Juncheng Zhou, Yongjie Zhu, Wenyu Qin, Meng Wang, Pengfei Wan, Jufeng Yang
Links:
Keywords: dit, video editing, evaluation, video generation, benchmark - UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs
Authors: Lifan Jiang, Tianrun Wu, Yuhang Pei, Chenyang Wang, Boxi Wu, Deng Cai
Links:|
Keywords: benchmark, dit, evaluation, video editing - Empowering Video Translation using Multimodal Large Language Models
Authors: Bingzheng QU, Kehai Chen, Xuefeng Bai, Min Zhang
Links:
Keywords: dit, identity, controllable, video translation, survey - Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models
Authors: Dehui Wang, Congsheng Xu, Rong Wei, Yue Shi, Shoufa Chen, Dingxiang Luo, Tianshuo Yang, Xiaokang Yang, Wei Sui, Yusen Qin, Rui Tang, Yao Mu
Links:
Keywords: video-to-video, video diffusion, diffusion model, super-resolution - InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation
Authors: Zhefan Rao, Bin Zou, Haoxuan Che, Xuanhua He, Chong Hou Choi, Yanheng Li, Rui Liu, Qifeng Chen
Links:
Keywords: dit, efficient, video editing, architecture, video generation, diffusion model, benchmark, video diffusion - Lighting-grounded Video Generation with Renderer-based Agent Reasoning
Authors: Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi
Links:
Keywords: video synthesis, image-to-video, dit, video-to-video, temporal consistency, video generation, diffusion model, trajectory, layout, controllable, video diffusion, film - ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks
Authors: Jiayang Xu, Fan Zhuo, Majun Zhang, Changhao Pan, Zehan Wang, Siyu Chen, Xiaoda Yang, Tao Jin, Zhou Zhao
Links:
Keywords: dit, efficient, video editing, dynamics, temporal consistency - Evolution of Video Generative Foundations
Authors: Teng Hu, Jiangning Zhang, Hongrui Huang, Ran Yi, Zihan Su, Jieyu Weng, Zhucun Xue, Lizhuang Ma, Ming-Hsuan Yang, Dacheng Tao
Links:|
Keywords: dit, education, video editing, world model, autonomous driving, dynamics, simulation, video generation, diffusion model, survey - UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining
Authors: Pei Yang, Hai Ci, Beibei Lin, Yiren Song, Mike Zheng Shou
Links:|
Keywords: benchmark, video generation, video-to-video, physical - VOID: Video Object and Interaction Deletion
Authors: Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng
Links:
Keywords: dit, video editing, dynamics, diffusion model, video diffusion, physical
- LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving
Authors: Hao Shao, Letian Wang, Yang Zhou, Yuxuan Hu, Zhuofan Zong, Steven L. Waslander, Wei Zhan, Hongsheng Li
Links:
Keywords: world model, autonomous driving, video prediction, autoregressive, video generation, benchmark - Novel View Synthesis as Video Completion
Authors: Qi Wu, Khiem Vuong, Minsik Jeon, Srinivasa Narasimhan, Deva Ramanan
Links:|
Keywords: video completion, diffusion model, benchmark, video diffusion, novel view - SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation
Authors: Hiba Dahmani, Nathan Piasco, Moussab Bennehar, Luis RoldΓ£o, Dzmitry Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland BrΓ©mond
Links:
Keywords: outpainting, dit, diffusion model - ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation
Authors: Dmitriy Rivkin, Parker Ewen, Lili Gao, Julian Ost, Stefanie Walz, Rasika Kangutkar, Mario Bijelic, Felix Heide
Links:
Keywords: dit, efficient, latent video, video generation, diffusion model, video inpainting, video diffusion, video enhancement, super-resolution - ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation
Authors: Liudi Yang, George Eskandar, Fengyi Shen, Mohammad Altillawi, Yang Bai, Chi Zhang, Ziyuan Liu, Abhinav Valada
Links:
Keywords: camera control, dit, video interpolation, diffusion model, video diffusion, novel view - Pinterest Canvas: Large-Scale Image Generation at Pinterest
Authors: Yu Wang, Eric Tzeng, Raymond Shiau, Jie Yang, Dmitry Kislyuk, Charles Rosenberg
Links:
Keywords: image-to-video, dit, outpainting, video generation, diffusion model
Showing the latest 50 out of 61 papers
- HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos
Authors: Xu Lu, Qianhong Peng, Qihao Zhou, Shaopeng Liu, Xiuqin Ye, Chuan Yang, Yuan Yuan
Links:
Keywords: temporal consistency, medical, denoising, sound - Seeing Fast and Slow: Learning the Flow of Time in Videos
Authors: Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
Links:
Keywords: dit, concept, film, video generation, controllable, super-resolution - X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
Links:
Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion - Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
Authors: Rui Li, Ke Hao, Yuanzhi Liang, Haibin Huang, Chi Zhang, YunGu, XueLong Li
Links:
Keywords: dit, evaluation, denoising, video generation, trajectory - Trustworthy Endoscopic Super-Resolution
Authors: Julio Silva-RodrΓguez, Ender Konukoglu
Links:
Keywords: efficient, super-resolution - Speculative Decoding for Autoregressive Video Generation
Authors: Yuezhou Hu, Jintao Zhang
Links:
Keywords: video synthesis, dit, streaming, distillation, autoregressive, denoising, video generation, acceleration, video diffusion - Efficient Video Diffusion Models: Advancements and Challenges
Authors: Shitong Shao, Lichen Bai, Pengfei Wan, James Kwok, Zeke Xie
Links:
Keywords: video synthesis, efficient, distillation, evaluation, denoising, diffusion model, trajectory, acceleration, video diffusion, survey - TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation
Authors: Xiangyu Liu, Feng Gao, Xiaomei Zhang, Yong Zhang, Xiaoming Wei, Zhen Lei, Xiangyu Zhu
Links:
Keywords: avatar, audio-driven, distillation, denoising, diffusion model, video diffusion - DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
Authors: Hengye Lyu, Zisu Li, Yue Hong, Yueting Weng, Jiaxin Shi, Hanwang Zhang, Chen Liang
Links:
Keywords: dit, interactive, streaming, distillation, style, autoregressive, denoising, video generation, long video, diffusion transformer - Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models
Authors: Dehui Wang, Congsheng Xu, Rong Wei, Yue Shi, Shoufa Chen, Dingxiang Luo, Tianshuo Yang, Xiaokang Yang, Wei Sui, Yusen Qin, Rui Tang, Yao Mu
Links:
Keywords: video-to-video, video diffusion, diffusion model, super-resolution
Showing the latest 50 out of 97 papers
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Authors: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
Links:
Keywords: dit, world model, evaluation, dynamics, simulation, video generation, action-conditioned, physical - WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Authors: Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
Links:
Keywords: image-to-video, dit, interactive, world model, style, evaluation, video generation, trajectory, benchmark - X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Authors: Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu
Links:
Keywords: dit, interactive, world model, autonomous driving, evaluation, autoregressive, simulation, denoising, video generation, acceleration, action-conditioned, controllable, video diffusion - CityRAG: Stepping Into a City via Spatially-Grounded Video Generation
Authors: Gene Chou, Charles Herrmann, Kyle Genova, Boyang Deng, Songyou Peng, Bharath Hariharan, Jason Y. Zhang, Noah Snavely, Philipp Henzler
Links:
Keywords: dit, t2v, autonomous driving, simulation, video generation, i2v, robotics, physical - UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Authors: Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge
Links:
Keywords: dit, world model, dynamics, simulation, video generation, benchmark, physical - How Far Are Video Models from True Multimodal Reasoning?
Authors: Xiaotian Zhang, Jianhui Wei, Yuan Wang, Jie Tan, Yichen Li, Yan Zhang, Ziyi Chen, Daoan Zhang, Dezhi YU, Wei Xu, Songtao Jiang, Zuozhu Liu
Links:
Keywords: interactive, evaluation, simulation, video generation, physical simulation, benchmark, physical - RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
Authors: Feng Jiang, Yang Chen, Kyle Xu, Yuchen Liu, Haifeng Wang, Zhenhao Shen, Jasper Lu, Shengze Huang, Yuanfei Wang, Chen Xie, Ruihai Wu
Links:
Keywords: world model, evaluation, dynamics, video generation, benchmark, physical - MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Authors: Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu
Links:|
Keywords: dit, multi-view video, world model, dynamics, video generation, action-conditioned - WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
Authors: Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu
Links:
Keywords: benchmark, interactive, evaluation, dit - HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
Authors: Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
Links:
Keywords: efficient, interactive, world model, architecture, multi-modal, trajectory, benchmark
-
Video Diffusion Models (NeurIPS 2022)
Authors: Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet
Keywords: Video Diffusion, Generative Model, Unconditional Video Generation -
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
Keywords: Latent Video Diffusion, Text-to-Video, High-Resolution -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (2023)
Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Menber, Maciej Kilian, Dominik Lorenz, et al.
Code: π GitHub
Keywords: Image-to-Video, Latent Video Diffusion, Large-Scale Training -
Sora: Video Generation Models as World Simulators (OpenAI, 2024)
Authors: OpenAI
Keywords: Text-to-Video, World Simulator, Diffusion Transformer, Long Video -
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer (2024)
Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, et al.
Code: π GitHub
Keywords: Text-to-Video, Diffusion Transformer, Expert Transformer
- CogVideo - Text-to-video generation with CogVideoX series models (Tsinghua & Zhipu AI)
- Open-Sora - Open-source Sora-like video generation framework
- Open-Sora-Plan - Reproducing Sora with an open-source plan
- HunyuanVideo - Tencent's large-scale video generation model
- Wan2.1 - Alibaba's open-source video generation model
- AnimateDiff - Animate personalized text-to-image models without specific tuning
- Stable Video Diffusion - Stability AI's video generation models
- ModelScope Text-to-Video - ModelScope text-to-video synthesis
- Video Generation Models as World Simulators - OpenAI's Sora technical report
- A Survey on Video Diffusion Models - Comprehensive survey on video diffusion
- Diffusion Models: A Comprehensive Survey - Foundation knowledge on diffusion models
- Unified CLI (
main.py): Single entry point withinit,search,suggest,export-bib,readmesubcommands - Interactive Config Wizard: Guided setup for keywords, domains, time range, and API keys via
python main.py init - Custom Search Keywords: Configure keywords for title, abstract, or both; with arXiv domain filtering (
cs.CV,cs.AI,cs.MM, etc.) - Time Range Filtering: Relative periods (
30d,6m,1y,2y) or absolute date ranges (YYYY-MM-DDtoYYYY-MM-DD) - Smart Link Extraction: Auto-classifies URLs from abstracts into GitHub, project page, dataset, video, demo, HuggingFace links
- BibTeX Export: Fetch BibTeX from arXiv official API; export to
.bibfiles with category and date filters - LLM Keyword Suggestion: Input paper titles or arXiv IDs to auto-generate optimized search keywords via OpenAI-compatible API
- Automated Paper Collection: Daily automatic crawling with GitHub Actions
- Intelligent Classification: Auto-categorize papers into 16 topics (T2V, I2V, Video Editing, Controllable Generation, World Models, etc.)
- Robust Error Handling: Multi-layer retry and fallback strategies ensure stable operation
- GitHub Actions Integration: Automated CI/CD workflows for daily updates
- Multi-type Link Badges: README entries display PDF, GitHub (with stars), Project, Dataset, Video, Demo, HuggingFace, and Citation badges
- Detailed Logging: Comprehensive logging for debugging and monitoring
- Cross-Platform: Support for Windows/Linux/macOS
- Paper JSON files (
data/papers_YYYY-MM-DD.json): Full paper metadata with title, authors, abstract, links, keywords, BibTeX - BibTeX files (
output/*.bib): Ready-to-use bibliography files for LaTeX - Auto-generated README: Categorized and formatted paper listings
pip install -r requirements.txtpython main.py initThis wizard walks you through:
- Setting search keywords (for title, abstract, or both)
- Selecting arXiv domains (e.g.,
cs.CV,cs.AI,cs.MM) - Configuring time range (relative like
6m/1y, or absolute dates) - Setting max results
- Optionally configuring an OpenAI-compatible API key for keyword suggestion
# Search with settings from user_config.json
python main.py search
# Override: fetch 200 papers from the last 6 months, include BibTeX
python main.py search --max-results 200 --recent 6m --bibtex
# Search with absolute date range
python main.py search --date-from 2024-01-01 --date-to 2025-01-01
# Include citation counts from Semantic Scholar
python main.py search --citations# Export all papers from the latest data file
python main.py export-bib --output output/references.bib
# Export only "Text-to-Video Generation" papers
python main.py export-bib --category "Text-to-Video Generation" --output output/t2v.bib
# Export papers from a specific date range
python main.py export-bib --date-from 2024-06-01 --date-to 2025-01-01 --output output/recent.bib# Generate keywords from paper titles
python main.py suggest --titles "Video Diffusion Models" "Stable Video Diffusion"
# Generate from arXiv IDs (auto-fetches titles)
python main.py suggest --arxiv-ids 2204.03458 2311.15127
# Auto-write suggested keywords to config
python main.py suggest --titles "Sora" "CogVideoX" --apply
# Use a custom API endpoint (e.g., DeepSeek)
python main.py suggest --titles "Paper Title" --base-url https://api.deepseek.com/v1 --api-key sk-xxx --model deepseek-chat# Basic README
python main.py readme
# Include latest papers section and abstracts
python main.py readme --show-latest --show-abstractsAll settings are stored in data/user_config.json:
{
"search": {
"keywords": {
"both_abstract_and_title": ["video diffusion", "video generation", "text-to-video"],
"abstract_only": ["diffusion model video generation"],
"title_only": ["video generation", "video diffusion"]
},
"domains": ["cs.CV", "cs.AI", "cs.MM"],
"time_range": {
"mode": "relative",
"relative": "1y"
},
"max_results": 500
},
"api_keys": {
"openai_api_key": "",
"openai_base_url": "https://api.openai.com/v1",
"openai_model": "gpt-4o-mini"
}
}Feel free to submit Pull Requests to improve this list! Please follow these formats:
- Paper entry format:
**[Paper Title](link)** - Brief description - Project entry format:
[Project Name](link) - Project description
