Skip to content

Inquiry regarding reproduction of "Emergent TTS Eval" results #1253

@hujingbin1

Description

@hujingbin1

Self Checks

  • This template is only for bug reports. For questions, please visit Discussions.
  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues, including closed ones. Search issues
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Source)

Environment Details

Python 3.10

Steps to Reproduce

I would like to extend my sincere gratitude to the team for fully open-sourcing the Fish Audio S2 model, including the weights, fine-tuning code, and inference engine. This is a significant contribution to the open-source community and has greatly lowered the barrier to entry for high-quality speech synthesis, allowing us to learn from and utilize such advanced technology.

However, I am currently attempting to reproduce the results reported in the Fish Audio S2 technical report on the Emergent TTS Bench, but I am facing a significant performance gap. My current reproduction yields an Overall Win Rate of 0.2534, which is far lower than the results reported in the paper.

For my current test setup, I utilized the raw text from the evaluation set and did not use prompt audio as a condition. To help identify the discrepancy, could you please clarify the following details regarding your experimental setup?
Text Processing: What specific text processing pipeline was applied to the evaluation data?
Model Selection: Were the scores achieved using the public open-source weights or an internal model?
Inference Mode: Was the evaluation conducted using voice cloning (with prompt conditions) or unconditional generation?

Thanks!

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions