I test the given checkpoint SIT-XL/2 using command specified in README.
with cfg=1.5: torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py ODE --cfg-scale 1.5 --model SiT-XL/2 --num-fid-samples 50000
without cfg: torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py ODE --model SiT-XL/2 --num-fid-samples 50000
In my testing, the FID for cfg=1.0 (w/o cfg) is 9.5 (I try several times and all > 9.3). But in paper, the SIT-XL/2 7M (w/o cfg) achieve FID=8.3. I wonder where will cause this performance misalignment of sampling from pretrained SIT models without cfg.
|
FID (this repo) |
FID (paper) |
| SIT-XL/2 (w/o. cfg) |
9.5 |
8.3 |
| SIT-XL/2 (w/. cfg=1.5) |
2.08 |
2.06 |
I test the given checkpoint SIT-XL/2 using command specified in README.
with cfg=1.5:
torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py ODE --cfg-scale 1.5 --model SiT-XL/2 --num-fid-samples 50000without cfg:
torchrun --nnodes=1 --nproc_per_node=8 sample_ddp.py ODE --model SiT-XL/2 --num-fid-samples 50000In my testing, the FID for cfg=1.0 (w/o cfg) is 9.5 (I try several times and all > 9.3). But in paper, the SIT-XL/2 7M (w/o cfg) achieve FID=8.3. I wonder where will cause this performance misalignment of sampling from pretrained SIT models without cfg.