Skip to content

[contrib] Add Wan2.1-T2V-1.3B NeuronX port#102

Open
lutfanm-aws wants to merge 1 commit intomainfrom
contrib/Wan2.1-T2V-1.3B
Open

[contrib] Add Wan2.1-T2V-1.3B NeuronX port#102
lutfanm-aws wants to merge 1 commit intomainfrom
contrib/Wan2.1-T2V-1.3B

Conversation

@lutfanm-aws
Copy link

Summary

  • Adds NeuronX implementation of Wan-AI/Wan2.1-T2V-1.3B-Diffusers
  • Text-to-video diffusion model: T5 encoder + transformer backbone + causal 3D VAE
  • All components run entirely on Trainium
  • Supports 13-frame and 49-frame generation at 480×832

Model Details

  • Architecture: Text-to-video diffusion (T5 encoder + DiT backbone + 3D VAE)
  • Parameters: 1.3B
  • Precision: BF16
  • Instance: trn2.48xlarge

Performance

Config Backbone VAE Total Cores
13 frames 36.5s 6.6s 43s 2
49 frames 70.7s 31.8s 102s 10

Validation

Component Cosine vs CPU
Backbone (single step) 0.9998
T5 encoder 0.998
VAE (per block) ≥0.9998

Files

  • contrib/models/Wan2.1-T2V-1.3B/src/modeling_wan.py — Model, VAE wrappers, XLA compat
  • contrib/models/Wan2.1-T2V-1.3B/src/scripts/ — Compilation and inference scripts
  • contrib/models/Wan2.1-T2V-1.3B/test/ — Integration tests
  • contrib/models/Wan2.1-T2V-1.3B/README.md — Documentation

Text-to-video diffusion model (1.3B params) with T5 encoder, transformer backbone,
and causal 3D VAE all running on Trainium. Supports 13-frame and 49-frame generation
at 480x832 with context parallelism for longer sequences.
@lutfanm-aws lutfanm-aws marked this pull request as ready for review March 26, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant