How to achieve "4.5s end-to-end generation" as in the readme?

Hi, 
Thanks for the model. 
I am trying to run the inference with the new 1.1 version.
But it is quite slow even with "--optimized " on as described in the readme.

I am running on L40s GPU with 
"--optimized" on and 
with "pipe.load_lora_weights(
    "FireRedTeam/FireRed-Image-Edit-1.0-Lightning", weight_name="FireRed-Image-Edit-1.0-Lightning-8steps-v1.0.safetensors"
)" , 
with 8 steps, the speed is 18s per image, far more than 4.5s per generation as described?

I ran a batch of images, even disregard of the first image being slower due to warm up. The other images were not faster either.
Anything am missing? 
maybe @Maycbj can help me here? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to achieve "4.5s end-to-end generation" as in the readme? #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to achieve "4.5s end-to-end generation" as in the readme? #41

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions