feat(voice): native voice agent support with Pipecat ASR/LLM/TTS by itsarbit · Pull Request #184 · arklexai/arksim

itsarbit · 2026-06-26T21:44:16Z

Summary

Adds a native voice agent type so arksim can evaluate voice agents through their real speech stack. v1 supports Pipecat: arksim voices the simulated user with local Kokoro TTS, drives the agent's own ASR to LLM to TTS pipeline with audio, captures the spoken reply, and transcribes it with local faster-whisper for the existing evaluator. Tool calls are captured and tagged with a pipecat source.

What's included

New agent_type: voice with a framework discriminator (pipecat implemented, livekit reserved) and pluggable tts/stt providers, defaulting to local Kokoro + faster-whisper (no API keys, CI reproducible).
arksim/speech/: TTS/STT provider abstractions and a registry, mirroring the arksim/llms/ pattern.
VoiceAgent dispatcher and factory wiring. User agent code stays framework pure via a zero-arg agent_factory callable (for example ./agent.py:build).
arksim/integrations/pipecat.py: drives the Pipecat pipeline with VAD bracketed 16 kHz audio injection and captures the agent's TTS audio on the TTS frame boundary.
ToolCallSource.PIPECAT (and LIVEKIT reserved), feeding the tool-call evaluation work (PLA-106).
Example at examples/integrations/pipecat-voice/, plus a key-free smoke_local.py.
Optional extras: arksim[voice] (Kokoro + faster-whisper). The agent's own pipecat services install separately.

How it works

simulated user TEXT -> [arksim Kokoro TTS] -> AUDIO -> agent ASR -> LLM -> agent TTS -> AUDIO -> [arksim faster-whisper STT] -> TEXT -> evaluator

The agent's real STT, LLM, and TTS run; arksim supplies the simulated-user voice and transcribes the reply. The simulator and evaluator are unchanged.

How to test

Key-free, fully local (real audio loop, deterministic brain):

pip install -e ".[voice]"
python examples/integrations/pipecat-voice/smoke_local.py

Full LLM-backed run (needs OPENAI_API_KEY and, on Apple Silicon, pip install 'pipecat-ai[openai,mlx-whisper,silero]'):

arksim simulate-evaluate examples/integrations/pipecat-voice/config.yaml

Verified end to end against a real gpt-4o-mini Pipecat agent: the loop runs, transcripts are evaluated, and an HTML report is produced.

Notes and scope

The agent's real STT/LLM/TTS are exercised; text transport is intentionally not the model here.
pipecat and kokoro require Python 3.11+, so voice tests carry a 3.11+ skip marker (the 3.10 CI leg skips them). Driver and provider modules are import guarded so collection never fails without the extras.
Deferred follow-ups: LiveKit driver, accent/noise/volume perturbation (a no-op _perturb() seam is reserved in the driver), and voice-specific metrics (latency, WER, interruption handling). This PR evaluates conversational content through the real speech stack; it is not yet full voice QA.

Test plan

pytest tests/unit/: 914 passed, 9 skipped locally. Includes voice config, dispatcher, factory, registry, resample, the echo driver loop, and a model-free VAD-gated driver test that guards the segmented-STT contract.

codecov · 2026-06-26T21:45:38Z

Codecov Report

❌ Patch coverage is 50.15385% with 162 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
arksim/integrations/pipecat.py	0.00%	120 Missing ⚠️
arksim/speech/providers/faster_whisper.py	40.00%	18 Missing ⚠️
arksim/speech/providers/kokoro.py	47.82%	12 Missing ⚠️
arksim/simulation_engine/agent/clients/voice.py	83.33%	6 Missing ⚠️
arksim/speech/types.py	71.42%	4 Missing ⚠️
arksim/config/core/agent.py	94.73%	1 Missing ⚠️
arksim/simulation_engine/entities.py	90.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

itsarbit added 11 commits June 24, 2026 05:31

feat(voice): add voice agent type and config models

4f6051b

feat(voice): add load_callable factory-pointer resolver

7a72027

feat(voice): add speech provider abstractions and registry

d0b9fe3

feat(voice): add VoiceAgent dispatcher and factory wiring

e3ac4b9

feat(voice): add local Kokoro TTS and faster-whisper STT providers

8416214

feat(voice): add Pipecat voice driver, example, and extras

6215d24

feat(voice): add key-free local smoke test for the pipecat-voice example

8f35ec8

docs(voice): fix simulate-evaluate command (config path is positional)

9ed8130

fix(voice): resolve agent_factory path relative to config file

125c231

fix(voice): drive real segmented STT via VAD frames and 16kHz input

eff33b0

test(voice): guard VAD-frame contract with a fake segmented STT

09327a1

itsarbit requested a review from a team as a code owner June 26, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(voice): native voice agent support with Pipecat ASR/LLM/TTS#184

feat(voice): native voice agent support with Pipecat ASR/LLM/TTS#184
itsarbit wants to merge 11 commits into
mainfrom
feat/voice-agent-support

itsarbit commented Jun 26, 2026

Uh oh!

codecov Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

itsarbit commented Jun 26, 2026

Summary

What's included

How it works

How to test

Notes and scope

Test plan

Uh oh!

codecov Bot commented Jun 26, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant