FastAPI server hosting typhoon-ai/typhoon-asr-realtime
(NeMo FastConformer-Transducer, 114M) for CPU-only inference.
Python 3.10+ recommended.
pip install -r requirements.txtNeMo also requires libsndfile and ffmpeg on the system, e.g.:
sudo apt-get install -y libsndfile1 ffmpegCUDA_VISIBLE_DEVICES="" python app.py
# or pick a port:
PORT=8001 CUDA_VISIBLE_DEVICES="" python app.pyThe model downloads from the Hugging Face Hub on first run (cached under
~/.cache/huggingface) and loads at startup. CUDA_VISIBLE_DEVICES="" forces CPU.
GET /health→{"status":"ok","loaded":true}POST /transcribe— multipart form, fieldfile(wav/mp3/m4a; auto-resampled to 16kHz mono)- query
?with_timestamps=trueto also return char/word timestamps
- query
curl -X POST http://localhost:8000/transcribe -F "file=@audio.wav"
# {"text":"..."}
curl -X POST "http://localhost:8000/transcribe?with_timestamps=true" -F "file=@audio.wav"Interactive docs at http://localhost:8000/docs.
- Thai-language model; non-Thai audio produces garbage output (expected).
torch.set_num_threads()is set to all CPU cores for intra-op parallelism.- For production, run with uvicorn directly, e.g.
CUDA_VISIBLE_DEVICES="" uvicorn app:app --host 0.0.0.0 --port 8000(each worker loads its own copy of the model into RAM, so add--workers Nonly if you have the memory).