Your AI doesn’t just think — it speaks, listens, and remembers.
This is a Telegram-first voice layer that sends voice updates out, ingests voice notes in, and stores transcripts into your memory backend.
- Telegram transport (outbound + inbound)
- Kokoro TTS via MUSE TTS (54 voices) for outbound voice notes
- faster-whisper STT for inbound voice transcription (local, fast, private-friendly)
- Pluggable transcript sink: local file, webhook, or MUSE Brain MCP adapter
Kokoro is the voice engine behind MUSE TTS. You get 54 voices and can pick a default persona voice without changing core logic.
faster-whisper is a lightweight speech-to-text server you can run locally.
In this repo it exposes an OpenAI-compatible transcription endpoint, so the bridge can call it like any standard STT API.
Choose one with MEMORY_SINK_MODE:
file(default): writes NDJSON to./state/transcripts.ndjsonwebhook: POSTs transcripts to your own endpointmcp: sends transcripts to MUSE Brain viamind_observe(optional adapter)
- You can run this with copy/paste setup +
.env. - No need to train models.
- Start local with sane defaults, then swap providers later.
- Works even if you keep your own stack and only want the Telegram+voice piece.
cp .env.example .env
npm install
npm run buildBy default transcripts are saved locally (MEMORY_SINK_MODE=file), so no MUSE dependency is required.
Set TELEGRAM_CHAT_ID to lock ingestion to your chat (recommended default).
Start STT sidecar:
python3 -m venv .venv
source .venv/bin/activate
pip install -r stt/requirements.txt
python stt/faster_whisper_server.pyStart bridge:
npm run bridgeSend demo message:
npm run demo:notifyFor the full setup flow see docs/QUICKSTART.md.
