A voice-controlled AI assistant for macOS, powered by Claude Code. Say "Jarvis" and it listens, transcribes, thinks, and speaks back.
- Wake word detection - Say "Jarvis" to activate (faster-whisper tiny, rolling buffer)
- Real-time transcription - See your words appear as you speak (faster-whisper preview + mlx-whisper turbo final)
- Claude Code backend - Full access to tools: Bash, file system, web search, code editing
- Natural voice - ElevenLabs multilingual v2 text-to-speech
- Streaming responses - Text appears token by token, TTS plays sentence by sentence
- Conversation mode - Follow-up questions without repeating "Jarvis"
- Auto-end detection - Jarvis stops listening when you're done or talking to someone else
Mic --> Wake word (faster-whisper tiny, rolling buffer)
--> Record + live preview (faster-whisper tiny)
--> Final transcription (mlx-whisper turbo on Apple Silicon GPU)
--> Claude Code CLI (persistent process, stream-json)
--> ElevenLabs TTS (sentence by sentence, background thread)
--> Speaker
--> Conversation mode (mic stays open) or back to wake word
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.9+
- Claude Code installed and authenticated
- ElevenLabs API key (free tier: 10,000 chars/month)
- ffmpeg (
brew install ffmpeg)
git clone https://github.com/TDS-Upec/Jarvis.git
cd Jarvis
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your ElevenLabs API keysource .venv/bin/activate
python main.pySay "Jarvis" to activate, then speak naturally in French.
main.py - Main loop, mic buffer, conversation flow
assistant.py - Claude Code CLI integration (persistent process)
transcriber.py - Speech-to-text via mlx-whisper (Apple Silicon GPU)
wake_word.py - Wake word detection via faster-whisper (rolling buffer)
speaker.py - Text-to-speech via ElevenLabs
audio.py - Audio capture and silence detection
config.py - Configuration and constants
ui.py - Terminal UI (styled output)
Edit config.py to adjust:
SILENCE_DURATION- How long to wait before cutting (default: 0.8s)CONVERSATION_TIMEOUT- Silence before ending conversation (default: 5s)ELEVENLABS_VOICE_ID- Change the voiceWHISPER_MODEL- Change the transcription model
- Claude Code - AI backend with full tool access
- mlx-whisper - Speech-to-text optimized for Apple Silicon
- faster-whisper - Lightweight STT for wake word
- ElevenLabs - Natural text-to-speech
- sounddevice - Audio capture