Skip to content

12s limit warning #10

@RemainIndoors1

Description

@RemainIndoors1

I don't know if this is a bug per se, but in case anybody else runs into this issue, it's probably good to be aware of at least. If you try to use reference audio that is longer than 12s, there is logic in f5_tts/infer/utils_infer.py that trims the audio down to 12 seconds. The problem comes into play when you also specify a transcription for your audio, because it won't trim the transcription to match what your new audio clip says after the trim, which leads to some really odd behavior with the generated audio.

One solution could be, if it Has to trim the audio to 12s for some reason, it could force calling the transcribe method for your new audio length, but to be on the safe side, you should make sure your sample audio is less than 12 seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions