🤖 Real-Time Voice Agent (LiveKit)

A real-time voice agent that joins a LiveKit room and interacts with users via audio. The agent listens to the user's speech, converts it to text, and responds back with synthesized audio.

✨ Features

🎙️ Speech-to-Text using Deepgram (nova-3 model)
🔊 Text-to-Speech using Cartesia
🧠 Voice Activity Detection (VAD) using Silero VAD (local model, no API key required)
🚫 No overlap — agent never speaks while user is speaking; stops immediately if interrupted
⏱️ Silence handling — plays a reminder if no user speech for 20+ seconds

⚙️ How It Works

Agent joins a LiveKit room and greets the user
Silero VAD continuously monitors audio to detect when the user is speaking
When the user finishes speaking, Deepgram STT transcribes the audio to text
The agent responds with "You said: <text>" converted to audio via Cartesia TTS
If the user speaks while the agent is talking, LiveKit's built-in interruption handling stops the agent immediately
If no speech is detected for 20 seconds, the agent plays a reminder prompt

🚫 No-Overlap / Interruption Handling

The LiveKit Agents SDK handles interruption automatically via the VAD pipeline:

Silero VAD detects speech start/end in real time
If the user starts speaking while the agent is outputting audio, the SDK cancels the agent's current speech immediately
The agent only resumes listening after the user finishes their turn
This is documented in the LiveKit Agents SDK under AgentSession interruption behavior

⏱️ Silence Handling

A 20-second countdown timer (asyncio.sleep(20)) starts after every user interaction or agent greeting
If the timer completes without being reset, the agent says "Are you still there? Feel free to say something."
The timer resets on every new user turn, preventing repeated reminders during active conversation

🛠️ Setup Instructions

Prerequisites

Python 3.9 or higher
A LiveKit Cloud account (free tier): https://livekit.io
A Deepgram account (free $200 credit): https://deepgram.com
A Cartesia account (free tier): https://cartesia.ai

Installation

Clone the repository:

git clone https://github.com/YOUR_USERNAME/voice-agent.git
cd voice-agent

Create and activate a virtual environment:

python -m venv venv
# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Install dependencies:

pip install livekit-agents livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero python-dotenv

Create a .env file in the project root:

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
DEEPGRAM_API_KEY=your_deepgram_api_key
CARTESIA_API_KEY=your_cartesia_api_key

🚀 How to Run

python agent.py dev

The agent will start and wait for someone to join a LiveKit room.

🧪 Testing

Use the LiveKit Agents Playground to connect and talk to the agent:

Go to https://agents-playground.livekit.io
Enter your LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET
Click Connect and allow microphone access
Speak — the agent will reply with "You said: <your words>"

🔑 Required Environment Variables

Variable	Description
`LIVEKIT_URL`	Your LiveKit Cloud WebSocket URL
`LIVEKIT_API_KEY`	LiveKit project API key
`LIVEKIT_API_SECRET`	LiveKit project API secret
`DEEPGRAM_API_KEY`	Deepgram API key for speech-to-text
`CARTESIA_API_KEY`	Cartesia API key for text-to-speech

📦 SDK Used

livekit-agents v1.4.3 — core agent framework
livekit-plugins-deepgram — Deepgram STT integration
livekit-plugins-cartesia — Cartesia TTS integration
livekit-plugins-silero — Silero VAD (local, no API key needed)

🌐 External Services

Service	Purpose	Free Tier
LiveKit Cloud	Real-time audio room infrastructure	Yes
Deepgram	Speech-to-text (nova-3 model)	$200 free credit
Cartesia	Text-to-speech	Yes

⚠️ Known Limitations

No UI — testing requires the LiveKit Agents Playground
The silence reminder loops every 20 seconds if the user remains silent indefinitely
Agent is English-only (Deepgram configured for en-US)
Requires a stable internet connection for Deepgram and Cartesia API calls

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Real-Time Voice Agent (LiveKit)

✨ Features

⚙️ How It Works

🚫 No-Overlap / Interruption Handling

⏱️ Silence Handling

🛠️ Setup Instructions

Prerequisites

Installation

🚀 How to Run

🧪 Testing

🔑 Required Environment Variables

📦 SDK Used

🌐 External Services

⚠️ Known Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Real-Time Voice Agent (LiveKit)

✨ Features

⚙️ How It Works

🚫 No-Overlap / Interruption Handling

⏱️ Silence Handling

🛠️ Setup Instructions

Prerequisites

Installation

🚀 How to Run

🧪 Testing

🔑 Required Environment Variables

📦 SDK Used

🌐 External Services

⚠️ Known Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages