Skip to content

Latest commit

 

History

History
277 lines (187 loc) · 6.63 KB

File metadata and controls

277 lines (187 loc) · 6.63 KB

Getting Started with VoiceMode

VoiceMode brings voice conversations to AI coding assistants. It works as both an MCP server for Claude Code and as a standalone CLI tool.

What is VoiceMode?

VoiceMode provides:

  • MCP Server: Adds voice tools to Claude Code - no installation needed
  • CLI Tool: Use VoiceMode's tools directly from your terminal
  • Local Services: Optional privacy-focused speech processing

Quick Start: Using with Claude Code

The fastest way to get started is using VoiceMode with Claude Code.

Installation

Install UV package manager (if not already installed), then run the VoiceMode installer:

# Install UV package manager (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install VoiceMode and configure services
uvx voice-mode-install

# Add to Claude Code MCP
claude mcp add --scope user voicemode -- uvx --refresh voice-mode

The installer will:

  • Install missing system dependencies (FFmpeg, PortAudio, etc.)
  • Set up your environment for VoiceMode
  • Offer to install local voice services (Whisper STT and Kokoro TTS)

Alternative UV installation methods:

  • macOS: brew install uv
  • With pip: pip install uv

Learn more: UV Installation Guide

2. Configure Your API Key

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="sk-your-api-key-here"

Or add it to your shell configuration file (~/.bashrc, ~/.zshrc, etc.)

3. Verify Installation

# Check that VoiceMode is connected
claude mcp list

You should see voicemode in the list of connected servers.

4. Configure Permissions (Optional)

By default, Claude Code prompts for permission each time VoiceMode tools are used. To enable automatic approval, add to ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__voicemode__converse",
      "mcp__voicemode__service"
    ]
  }
}

This allows voice conversations and service management without prompts. For more permission options, see the Permissions Guide.

5. Start Using Voice

In Claude Code, simply type:

converse

Speak when you hear the chime, and Claude will respond with voice!

Alternative: Using as a CLI Tool

If you want to use VoiceMode from the command line:

Installation

# Install with pip
uv tool install voice-mode

# Or install from source in editable mode
git clone https://github.com/mbailey/voicemode
cd voicemode
uv tool install -e .

Basic Usage

# Set your API key
export OPENAI_API_KEY="sk-your-api-key-here"

# Start a voice conversation
voicemode converse

Setting Up Local Services (Optional)

For complete privacy, you can run voice services locally instead of using OpenAI.

Quick Setup

# Install local services
voicemode service install whisper   # Speech-to-text
voicemode service install kokoro    # Text-to-speech

# Start services
voicemode service start whisper
voicemode service start kokoro

# Check status of all services
voicemode service status

VoiceMode will automatically detect and use these local services when available.

Enable Auto-Start (Recommended)

To have services start automatically at login:

# Enable services to start at boot/login
voicemode service enable whisper
voicemode service enable kokoro

On macOS, this creates launchd agents. On Linux, it creates systemd user services.

Download Sizes and Requirements

Service Download Size Disk Space First Start Time
Whisper (tiny) ~75MB ~150MB 30 seconds
Whisper (base) ~150MB ~300MB 1-2 minutes
Whisper (small) ~460MB ~1GB 2-3 minutes
Kokoro TTS ~350MB ~700MB 2-3 minutes

Recommended: Whisper base + Kokoro = ~500MB download, ~1GB disk space.

Waiting for Services

After installation, services download models on first start. Wait for them to be ready:

# Wait for Whisper (port 2022)
while ! nc -z localhost 2022 2>/dev/null; do sleep 2; done
echo "Whisper ready"

# Wait for Kokoro (port 8880)
while ! nc -z localhost 8880 2>/dev/null; do sleep 2; done
echo "Kokoro ready"

Learn more: Whisper Setup Guide | Kokoro Setup Guide

Configuration

VoiceMode works out of the box with sensible defaults. To customize:

Select Your Voice

# OpenAI voices
export VOICEMODE_VOICES="nova,shimmer"

# Or Kokoro voices (if using local TTS)
export VOICEMODE_VOICES="af_sky,am_adam"

Available OpenAI voices: alloy, echo, fable, onyx, nova, shimmer

Project-Specific Settings

Create .voicemode.env in your project:

export VOICEMODE_VOICES="af_nova,nova"
export VOICEMODE_TTS_SPEED=1.2

Learn more: Configuration Guide

Troubleshooting

Voice Not Working in Claude?

  1. Check MCP connection:

    claude mcp list
  2. Verify OPENAI_API_KEY is set in your MCP configuration

  3. Add to your MCP config:

    "env": {
      "OPENAI_API_KEY": "sk-...",
    }

No Audio Input?

# List audio devices
voicemode diag devices

# Test TTS and STT
voicemode converse

Service Issues?

# Check service status
voicemode service status           # All services
voicemode service status whisper   # Specific service

# View logs
voicemode service logs whisper -n 50
voicemode service logs kokoro -n 50

# Check if service is responding
voicemode service health whisper
voicemode service health kokoro

Running VoiceMode as a Service (Advanced)

For remote access or persistent operation, run VoiceMode as a background service:

# Start the VoiceMode HTTP server
voicemode service start voicemode

# Enable auto-start at boot/login
voicemode service enable voicemode

# Check all services
voicemode service status

The HTTP server enables remote access from other machines on your network or via secure tunnels.

For security best practices when running remotely, see the Configuration Guide.

Next Steps

Getting Help

Welcome to voice-enabled AI coding! 🎙️