Unified turn detection for voice pipelines, wrapping multiple open-source models behind common Rust traits. Same pattern as wavekat-vad.
Warning
Early development. API may change between minor versions.
| Backend | Feature flag | Input | Model size | Inference | License |
|---|---|---|---|---|---|
| Pipecat Smart Turn v3 | pipecat |
Audio (16 kHz PCM) | ~8 MB (int8 ONNX) | ~12 ms CPU | BSD 2-Clause |
| LiveKit Turn Detector | livekit |
Text (ASR transcript) | ~400 MB (ONNX) | ~25 ms CPU | LiveKit Model License |
cargo add wavekat-turn --features pipecatUse TurnController to wrap any detector with automatic state tracking:
use wavekat_turn::{TurnController, TurnState};
use wavekat_turn::audio::PipecatSmartTurn;
let detector = PipecatSmartTurn::new()?;
let mut ctrl = TurnController::new(detector);
// Feed audio continuously
ctrl.push_audio(&audio_frame);
// VAD speech start — soft reset (keeps buffer if turn was unfinished)
ctrl.reset_if_finished();
// VAD speech end — predict
let prediction = ctrl.predict()?;
match prediction.state {
TurnState::Finished => { /* user is done, send to LLM */ }
TurnState::Unfinished => { /* keep listening */ }
TurnState::Wait => { /* user asked AI to hold */ }
}
// After assistant finishes responding — hard reset
ctrl.reset();Or the text-based detector directly:
use wavekat_turn::{TextTurnDetector, TurnState};
use wavekat_turn::text::LiveKitEou;
let mut detector = LiveKitEou::new()?;
let prediction = detector.predict_text("I was wondering if", &context)?;
assert_eq!(prediction.state, TurnState::Unfinished);See examples/controller.rs for a
full walkthrough with real audio.
Two trait families cover the two input modalities:
AudioTurnDetector-- operates on raw audio frames (no ASR needed)TextTurnDetector-- operates on ASR transcript text with optional conversation context
TurnController wraps any AudioTurnDetector and adds orchestration helpers
like soft-reset (preserves buffer when the user pauses mid-sentence).
wavekat-vad --> "is someone speaking?"
wavekat-turn --> "are they done speaking?"
| |
v v
wavekat-voice --> orchestrates VAD + turn + ASR + LLM + TTS
| Flag | Default | Description |
|---|---|---|
pipecat |
off | Pipecat Smart Turn v3 audio backend (requires ort, ndarray) |
livekit |
off | LiveKit text-based backend (requires ort, ndarray) |
- 8 kHz telephony audio must be upsampled to 16 kHz before passing to audio-based detectors. Smart Turn v3 silently produces incorrect results at 8 kHz.
- Text-based detectors depend on ASR transcript quality. Pair with a streaming ASR provider for best results.
Cross-validated against the original Python (Pipecat) pipeline on three fixture clips. Tolerance: ±0.02 probability.
Run locally with make accuracy. See scripts/README.md for how to regenerate the Python reference.
Licensed under Apache 2.0.
Copyright 2026 WaveKat.
- Pipecat Smart Turn by Daily (BSD 2-Clause)
- LiveKit Turn Detector by LiveKit (LiveKit Model License)