A real-time voice agent that joins a LiveKit room and interacts with users via audio. The agent listens to the user's speech, converts it to text, and responds back with synthesized audio.
- 🎙️ Speech-to-Text using Deepgram (nova-3 model)
- 🔊 Text-to-Speech using Cartesia
- 🧠 Voice Activity Detection (VAD) using Silero VAD (local model, no API key required)
- 🚫 No overlap — agent never speaks while user is speaking; stops immediately if interrupted
- ⏱️ Silence handling — plays a reminder if no user speech for 20+ seconds
- Agent joins a LiveKit room and greets the user
- Silero VAD continuously monitors audio to detect when the user is speaking
- When the user finishes speaking, Deepgram STT transcribes the audio to text
- The agent responds with
"You said: <text>"converted to audio via Cartesia TTS - If the user speaks while the agent is talking, LiveKit's built-in interruption handling stops the agent immediately
- If no speech is detected for 20 seconds, the agent plays a reminder prompt
The LiveKit Agents SDK handles interruption automatically via the VAD pipeline:
- Silero VAD detects speech start/end in real time
- If the user starts speaking while the agent is outputting audio, the SDK cancels the agent's current speech immediately
- The agent only resumes listening after the user finishes their turn
- This is documented in the LiveKit Agents SDK under
AgentSessioninterruption behavior
- A 20-second countdown timer (
asyncio.sleep(20)) starts after every user interaction or agent greeting - If the timer completes without being reset, the agent says
"Are you still there? Feel free to say something." - The timer resets on every new user turn, preventing repeated reminders during active conversation
- Python 3.9 or higher
- A LiveKit Cloud account (free tier): https://livekit.io
- A Deepgram account (free $200 credit): https://deepgram.com
- A Cartesia account (free tier): https://cartesia.ai
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/voice-agent.git cd voice-agent -
Create and activate a virtual environment:
python -m venv venv # On Mac/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install livekit-agents livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero python-dotenv
-
Create a
.envfile in the project root:LIVEKIT_URL=wss://your-project.livekit.cloud LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret DEEPGRAM_API_KEY=your_deepgram_api_key CARTESIA_API_KEY=your_cartesia_api_key
python agent.py devThe agent will start and wait for someone to join a LiveKit room.
Use the LiveKit Agents Playground to connect and talk to the agent:
- Go to https://agents-playground.livekit.io
- Enter your
LIVEKIT_URL,LIVEKIT_API_KEY, andLIVEKIT_API_SECRET - Click Connect and allow microphone access
- Speak — the agent will reply with
"You said: <your words>"
| Variable | Description |
|---|---|
LIVEKIT_URL |
Your LiveKit Cloud WebSocket URL |
LIVEKIT_API_KEY |
LiveKit project API key |
LIVEKIT_API_SECRET |
LiveKit project API secret |
DEEPGRAM_API_KEY |
Deepgram API key for speech-to-text |
CARTESIA_API_KEY |
Cartesia API key for text-to-speech |
livekit-agentsv1.4.3 — core agent frameworklivekit-plugins-deepgram— Deepgram STT integrationlivekit-plugins-cartesia— Cartesia TTS integrationlivekit-plugins-silero— Silero VAD (local, no API key needed)
| Service | Purpose | Free Tier |
|---|---|---|
| LiveKit Cloud | Real-time audio room infrastructure | Yes |
| Deepgram | Speech-to-text (nova-3 model) | $200 free credit |
| Cartesia | Text-to-speech | Yes |
- No UI — testing requires the LiveKit Agents Playground
- The silence reminder loops every 20 seconds if the user remains silent indefinitely
- Agent is English-only (Deepgram configured for
en-US) - Requires a stable internet connection for Deepgram and Cartesia API calls