speakrs

Fast Rust speaker diarization with pyannote-level accuracy.

On VoxConverse dev, speakrs CoreML gets 7.1% DER at 529x realtime versus pyannote's 7.2% at 24x. Full results are in benchmarks/.

If you want a small end-to-end app using it, see avencera/smrze.

Fast Rust speaker diarization.

speakrs implements the full pyannote community-1 style pipeline in Rust: segmentation, powerset decode, overlap-add aggregation, binarization, embedding, PLDA, and VBx clustering. There is no Python runtime in the library path. Inference runs on ONNX Runtime or native CoreML and the rest of the pipeline stays in Rust.

The goal is to get pyannote-class diarization without shipping a Python stack. On VoxConverse dev, speakrs CoreML gets 7.1% DER at 529x realtime versus pyannote's 7.2% at 24x. Full tables are in benchmarks/.

Usage

# Apple Silicon (CoreML)
speakrs = { version = "0.4", features = ["coreml"] }

# NVIDIA GPU
speakrs = { version = "0.4", features = ["cuda"] }

# CPU only
speakrs = "0.4"

# System OpenBLAS
speakrs = { version = "0.4", default-features = false, features = ["online", "openblas-system"] }

Quick start

use speakrs::{ExecutionMode, OwnedDiarizationPipeline};

fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let mut pipeline = OwnedDiarizationPipeline::from_pretrained(ExecutionMode::CoreMl)?;

    let audio: Vec<f32> = load_your_mono_16khz_audio_here();
    let result = pipeline.run(&audio)?;

    print!("{}", result.rttm("my-audio"));
    Ok(())
}

Speaker turns

use speakrs::pipeline::{FRAME_DURATION_SECONDS, FRAME_STEP_SECONDS};

let result = pipeline.run(&audio)?;

for segment in result
    .discrete_diarization
    .to_segments(FRAME_STEP_SECONDS, FRAME_DURATION_SECONDS)
{
    println!("{:.3} - {:.3}  {}", segment.start, segment.end, segment.speaker);
}

Background queue

[QueueSender] and [QueueReceiver] run a background worker. Push audio from any thread and read results as they finish:

use speakrs::{ExecutionMode, OwnedDiarizationPipeline, QueuedDiarizationRequest};

let pipeline = OwnedDiarizationPipeline::from_pretrained(ExecutionMode::CoreMl)?;
let (tx, rx) = pipeline.into_queued()?;

std::thread::spawn(move || {
    for (file_id, audio) in receive_files() {
        tx.push(QueuedDiarizationRequest::new(file_id, audio)).unwrap();
    }
});

for result in rx {
    let result = result?;
    print!("{}", result.result?.rttm(&result.file_id));
}

Local models

For offline or airgapped setups, load models from a local directory:

use std::path::Path;
use speakrs::{ExecutionMode, OwnedDiarizationPipeline};

let mut pipeline = OwnedDiarizationPipeline::from_dir(
    Path::new("/path/to/models"),
    ExecutionMode::Cpu,
)?;
let result = pipeline.run(&audio)?;

Choosing a mode

Mode	Backend	Step	Use it for
`cpu`	ONNX Runtime CPU	1s	CPU runs and widest compatibility
`coreml`	Native CoreML	1s	Apple Silicon
`coreml-fast`	Native CoreML	2s	Apple Silicon for higher throughput
`cuda`	ONNX Runtime CUDA	1s	NVIDIA GPU
`cuda-fast`	ONNX Runtime CUDA	2s	NVIDIA GPU for higher throughput

The *-fast modes use a 2 second step instead of 1 second. They usually trade some boundary precision for more throughput. Start with coreml or cuda unless you already know you want the faster step size.

Benchmarks

VoxConverse dev, collar=0ms:

Platform	Implementation	DER	Time	RTFx
Apple M4 Pro	`speakrs` `coreml`	7.1%	138s	529x
Apple M4 Pro	`speakrs` `coreml-fast`	7.4%	169s	434x
Apple M4 Pro	pyannote community-1 (MPS)	7.2%	2999s	24x
RTX 4090	`speakrs` `cuda`	7.0%	1236s	59x
RTX 4090	`speakrs` `cuda-fast`	7.4%	604s	121x
RTX 4090	pyannote community-1 (CUDA)	7.2%	2312s	32x

On VoxConverse test, both coreml and cuda match pyannote at 11.1% DER and are much faster. See benchmarks/ for the full tables across all datasets.

CoreML and ONNX Runtime can differ slightly even in FP32 because the runtime graphs are not identical and floating-point reduction order changes rounding.

Why not pyannote-rs?

pyannote-rs is the main Rust-only comparison point, but it targets a different tradeoff.

	`speakrs`	`pyannote-rs`
Pipeline	Full pyannote `community-1` style pipeline	Simpler window-level pipeline
Aggregation	Overlap-add plus binarization	No overlap-add or binarization
Clustering	PLDA + VBx	Cosine threshold
Goal	Stay close to pyannote behavior on CPU/CUDA	Lightweight Rust diarization

On the VoxConverse dev subset where pyannote-rs emits output, speakrs CoreML scores 11.5% DER versus 80.2% for pyannote-rs. In that same run, pyannote-rs returned no segments on most files.

Models

With the default online feature, models download on first use from avencera/speakrs-models. Set SPEAKRS_MODELS_DIR if you want to force a local bundle instead.

Features and build notes

Common features:

online (default): model download via [ModelManager]
coreml: native CoreML backend for Apple Silicon
cuda: NVIDIA CUDA backend via ONNX Runtime
load-dynamic: load the CUDA runtime at startup instead of static linking

BLAS backends matter if you disable default features:

x86_64 defaults to statically linked Intel MKL
non-x86_64 defaults to statically linked OpenBLAS and needs a C toolchain
advanced opt-ins are intel-mkl, openblas-static, and openblas-system

speakrs = { version = "0.4", default-features = false, features = ["online", "intel-mkl"] }
speakrs = { version = "0.4", default-features = false, features = ["online", "openblas-system"] }

The ONNX Runtime dependency (ort 2.0.0-rc.12) is still pre-release.

Public API

Start here:

[OwnedDiarizationPipeline]: pipeline entry point
[QueueSender] and [QueueReceiver]: background worker interface
[DiarizationResult]: frame-level activations, segments, clusters, embeddings, RTTM
[PipelineConfig] and [RuntimeConfig]: tuning knobs
[ModelManager]: model download when online is enabled
[Segment]: a single speaker turn

Contributing

See CONTRIBUTING.md for local setup, model downloads, fixture generation, and the standard check commands used in this repo.

References

pyannote-audio - Python reference implementation
pyannote community-1 - VBx + PLDA pipeline
SpeakerKit - Swift reference (same VBx architecture)

Name		Name	Last commit message	Last commit date
Latest commit History 290 Commits
.cargo		.cargo
.dstack		.dstack
adr		adr
benchmarks		benchmarks
docker		docker
examples		examples
fixtures		fixtures
scripts		scripts
src		src
tests		tests
xtask		xtask
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md
docker-entrypoint.sh		docker-entrypoint.sh
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speakrs

Usage

Quick start

Speaker turns

Background queue

Local models

Choosing a mode

Benchmarks

Why not pyannote-rs?

Models

Features and build notes

Public API

Contributing

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

speakrs

Usage

Quick start

Speaker turns

Background queue

Local models

Choosing a mode

Benchmarks

Why not pyannote-rs?

Models

Features and build notes

Public API

Contributing

References

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages