Releases: NVIDIA/voice-agent-examples
Releases · NVIDIA/voice-agent-examples
v0.4.0
NVIDIA Pipecat 0.4.0 (3 March 2026)
Added
- Multilingual support for the voice agent
- AI agent deployment skill for WebRTC, WebSocket, and NAT agent examples
- Jetson Thor edge deployment support
- OpenTelemetry tracing support for ASR, TTS, and LLM services
- Riva Text Filter option to clean and normalize LLM output before TTS processing
- Display of unfiltered transcripts in the WebRTC UI
- Support for Nemotron 3 Nano LLM model
Changed
- BREAKING: Renamed
RivaASRServicetoNemotronASRServiceandRivaTTSServicetoNemotronTTSServiceto better reflect the underlying Nemotron Speech technology. The old names remain available as deprecated aliases. - Upgraded to pipecat 0.0.98
- Migrated to
RTVIObserverfor the WebRTC UI - Changed default TTS sample rate to 22.05 kHz for WebRTC examples
- Updated the Jetson Thor guide to use the public Riva 2.24.0 release
Fixed
- Chat history truncation logic for Nemotron models
- Riva
generate_interruptionslogic - TTS chunk cutoff at websocket transport layer by appending silence at the end of each TTS response
- TTS text normalization in
NemotronTTSService
Removed
- Riva NMT processor and BlingFire Text Aggregator
v0.3.0
NVIDIA Pipecat 0.3.0 (7 November 2025)
New Features
- Added WebRTC-based voice agent example and custom UI
- Nemo Agent Toolkit integration and Voice Agent example with Agentic AI
- Scripts for latency and throughput performance benchmarking for Voice Agents
- Support for Dynamic LLM prompt ingestion and TTS Voice selection using WebRTC UI
- Full-Duplex-Bench evaluation inference client script
- BlingFireTextAggregator for TTS Service
- Added steps for LLM deployment with KV Cache support
Improvements
- Updated pipecat to version 0.0.85
- Renamed GitHub repository to voice-agent-examples
- Switched to Magpie TTS Multilingual model
- Hardcoded NIM version tags in examples
Fixed
- Fixed user transcriptions and Docker Compose volume issues
- Split long TTS sentences to handle Riva TTS character limit error
Removed
- Removed Animation and Audio2Face support
- Removed ACE naming references
v0.2.0
NVIDIA Pipecat 0.2.0 (17 June 2025)
New Features
- Support for deepseek, mistral-ai, and llama-nemotron models in Nvidia LLM Service
- Support for BotSpeakingFrame in animation graph service
Improvements
- Upgraded Riva Client version to 2.20.0
- Upgraded to pipecat 0.0.68
- Improved animation graph stream handling
- Improved task cancellation support in NVIDIA LLM and NVIDIA RAG Service
Fixed
- Fixed transcription synchronization for multiple final ASR transcripts
- Fixed edge case where the mouth of the avatar would not close
- Fixed animation stream handling for broken streams
- Fixed Elevenlabs edge case issues with multi-lingual use cases
- Fixed chunk truncation issues in RAG Service
- Fixed dangling tasks and pipeline cleanup issues
v0.1.0
NVIDIA Pipecat 0.1.0 (23 April 2025)
The NVIDIA Pipecat library augments the Pipecat framework by adding additional frame processors and services, as well as new multimodal frames to enhance avatar interactions. This is the first release of the NVIDIA Pipecat library.
New Features
- Added Pipecat services for Riva ASR (Automatic Speech Recognition), Riva TTS (Text to Speech), and Riva NMT (Neural Machine Translation) models.
- Added Pipecat frames, processors, and services to support multimodal avatar interactions and use cases. This includes
Audio2Face3DService,AnimationGraphService,FacialGestureProviderProcessor, andPostureProviderProcessor. - Added
ACETransport, which is specifically designed to support integration with existing ACE microservices. This includes a FastAPI-based HTTP and WebSocket server implementation compatible with ACE. - Added
NvidiaLLMServicefor NIM LLM models andNvidiaRAGServicefor the NVIDIA RAG Blueprint. - Added
UserTranscriptSynchronizationprocessor for user speech transcripts andBotTranscriptSynchronizationprocessor for synchronizing bot transcripts with bot audio playback. - Added custom context aggregators and processors to enable Speculative Speech Processing to reduce latency.
- Added
UserPresence,Proactivity, andAcknowledgementProcessorframe processors to improve human-bot interactions. - Released source code for the voice assistant example using
nvidia-pipecat, along with thepipecat-ailibrary service, to showcase NVIDIA services withACETransport.
Improvements
- Added
ElevenLabsTTSServiceWithEndOfSpeech, an extended version of the ElevenLabs TTS service with end-of-speech events for usage in avatar interactions.