Skip to content

ainergiz/local-translate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demo

translation.demo.mp4

Local Translate

Real-time medical speech translation running entirely on your Mac. No cloud, no API keys, no data leaves your device.

Speak English into your microphone, hear the Spanish translation spoken back. You get a written transcript of both sides for free.

How it works

Three open-source models run sequentially on Apple Silicon through MLX:

Stage Model Size What it does
Speech-to-text Voxtral Realtime 4B 2.4 GB Transcribes English speech
Translation TranslateGemma 4B 2.3 GB Translates English to Spanish
Text-to-speech Kokoro 82M 330 MB Speaks the Spanish translation

Turn detection uses Silero VAD (2 MB ONNX model on CPU) to detect when you stop talking and trigger the pipeline.

Total memory footprint is ~8 GB. Runs on any Apple Silicon Mac with 16 GB+ unified memory.

Setup

Requires macOS on Apple Silicon (M1/M2/M3/M4) and Python 3.11+.

# Install espeak (needed for Spanish text-to-speech phonemization)
brew install espeak-ng

# Install spaCy English model
uv pip install en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl

# Install dependencies
uv sync

Models download automatically from Hugging Face on first run (~5 GB total).

Usage

Live mic (default) — speak English, hear Spanish:

uv run python pipeline.py

From file — translate an audio file:

uv run python pipeline.py --file recording.wav

Press Ctrl+C to stop.

Changing the language pair

The pipeline currently translates English to Spanish. To change the target language:

  1. Update the translation prompt in translate() (change target_lang_code and the instruction text)
  2. Update the TTS voice and language code in the model constants (TTS_VOICE, lang_code)

Kokoro supports English, Spanish, French, Hindi, Italian, Portuguese, Japanese, and Mandarin. TranslateGemma supports many more language pairs.

Limitations

  • Runs on Apple Silicon only (MLX requirement)
  • One speaker at a time (no overlapping speech)
  • English to Spanish only (easily configurable)
  • Not a medical device

License

MIT

About

Local ASR(Mistral Voxtral) -> Translation (TranslationGemma) -> STT(Kokoro 80M) pipeline, all running on Macbook locally.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages