Unified text-to-speech for voice pipelines, wrapping multiple TTS engines behind common Rust traits. Same pattern as wavekat-vad and wavekat-turn.
Warning
Early development. API may change between minor versions.
| Backend | Feature flag | License |
|---|---|---|
| Qwen3-TTS | qwen3-tts |
Apache 2.0 |
| CosyVoice | cosyvoice |
Apache 2.0 |
cargo add wavekat-tts --features qwen3-ttsuse wavekat_tts::{TtsBackend, SynthesizeRequest};
use wavekat_tts::backends::qwen3_tts::Qwen3Tts;
// Auto-downloads model files (~3.8 GB) on first run:
let tts = Qwen3Tts::new()?;
// Or load from an explicit directory:
// let tts = Qwen3Tts::from_dir("models/qwen3-tts-0.6b")?;
let request = SynthesizeRequest::new("Hello, world");
let audio = tts.synthesize(&request)?;
println!("{}s at {} Hz", audio.duration_secs(), audio.sample_rate());Model files are cached at $WAVEKAT_MODEL_DIR or ~/.cache/wavekat/qwen3-tts-0.6b/.
All backends produce AudioFrame<'static> from wavekat-core — the same
type consumed by wavekat-vad and wavekat-turn.
wavekat-vad → "is someone speaking?"
wavekat-turn → "are they done speaking?"
wavekat-tts → "synthesize the response"
│ │ │
└───────────────────┴─────────────────────┘
│
AudioFrame (wavekat-core)
Two trait families:
TtsBackend— batch synthesis: text →AudioFrame<'static>StreamingTtsBackend— streaming: text → iterator ofAudioFrame<'static>chunks
Generate a WAV file from text (model files are auto-downloaded on first run):
cargo run --example synthesize --features qwen3-tts,hound -- "Hello, world\!"
cargo run --example synthesize --features qwen3-tts,hound -- --language zh "你好世界"
cargo run --example synthesize --features qwen3-tts,hound -- --model-dir /path/to/model --output hello.wav "Hello"| Flag | Default | Description |
|---|---|---|
qwen3-tts |
off | Qwen3-TTS local ONNX inference |
cosyvoice |
off | CosyVoice local ONNX inference |
Licensed under Apache 2.0.
Copyright 2026 WaveKat.