Skip to content

warits-dev/typhoon-asr-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Typhoon ASR Realtime — CPU API

FastAPI server hosting typhoon-ai/typhoon-asr-realtime (NeMo FastConformer-Transducer, 114M) for CPU-only inference.

Install

Python 3.10+ recommended.

pip install -r requirements.txt

NeMo also requires libsndfile and ffmpeg on the system, e.g.:

sudo apt-get install -y libsndfile1 ffmpeg

Run

CUDA_VISIBLE_DEVICES="" python app.py
# or pick a port:
PORT=8001 CUDA_VISIBLE_DEVICES="" python app.py

The model downloads from the Hugging Face Hub on first run (cached under ~/.cache/huggingface) and loads at startup. CUDA_VISIBLE_DEVICES="" forces CPU.

Endpoints

  • GET /health{"status":"ok","loaded":true}
  • POST /transcribe — multipart form, field file (wav/mp3/m4a; auto-resampled to 16kHz mono)
    • query ?with_timestamps=true to also return char/word timestamps
curl -X POST http://localhost:8000/transcribe -F "file=@audio.wav"
# {"text":"..."}

curl -X POST "http://localhost:8000/transcribe?with_timestamps=true" -F "file=@audio.wav"

Interactive docs at http://localhost:8000/docs.

Notes

  • Thai-language model; non-Thai audio produces garbage output (expected).
  • torch.set_num_threads() is set to all CPU cores for intra-op parallelism.
  • For production, run with uvicorn directly, e.g. CUDA_VISIBLE_DEVICES="" uvicorn app:app --host 0.0.0.0 --port 8000 (each worker loads its own copy of the model into RAM, so add --workers N only if you have the memory).