A C++ / CUDA / TensorRT implementation of StreamDiffusion
Implemented in ossia score
On a RTX 5090 at 1 step:
SDXL Turbo 1024x1024: stable 26 fps
SD Turbo 512x512: stable 96 fps
SDXS: above 600 fps
Models need to be converted to TensorRT through the Python script [train-lora.py] beforehand:
$ uv run train-lora.py --model stabilityai/sd-turbo --min-batch 1 --max-batch 1 --opt-batch 1 --min-resolution 512 --max-resolution 1024 --output ./engines-sd-turbo

