RenderCaption is a one-click install desktop app that generates subtitles, captions, and transcriptions from any audio or video file β entirely offline on your own machine. It supports 25+ languages including Hindi, Hinglish, Punjabi, English, French, Japanese, and more. Powered by local AI models via parakeet-cpp (GGML), with hardware acceleration across NVIDIA, AMD, and Intel GPUs.
TL;DR β Transcribe 1 hour of audio in ~10 seconds on an RTX 5070 Ti. Export to SRT subtitles instantly. No cloud, no API keys, free forever.
capit-demo.mp4
- Download the latest installer from the Releases Page.
- Run the
.exeor.msiinstaller. - Open RenderCaption β download a model from the Model Manager β drop your audio file β click Transcribe.
That's it. No Python, no Docker, no API keys.
All models are downloaded on-demand through the in-app Model Manager. They are quantized .gguf files hosted on Hugging Face.
| Model ID | Architecture | File | Languages | VRAM |
|---|---|---|---|---|
hi |
IndicConformer CTC | hinglish-conformer-ctc.f32.gguf |
Hindi, English (Hinglish) | ~1.1 GB |
pa |
IndicConformer CTC | indicconformer-punjabi.f32.gguf |
Punjabi | ~1.1 GB |
hi_large |
IndicConformer CTC | indicconformer-hindi.f32.gguf |
Hindi | ~1.1 GB |
pt |
FastConformer Hybrid | portuguese-fastconformer-hybrid-large.f32.gguf |
Portuguese | ~1.1 GB |
| Model ID | Architecture | File | Languages | VRAM |
|---|---|---|---|---|
tdt-1.1b-q8 |
Parakeet TDT 1.1B (Q8) | tdt-1.1b-q8_0.gguf |
English Only | ~1.8 GB |
tdt-1.1b-q4 |
Parakeet TDT 1.1B (Q4) | tdt-1.1b-q4_k_m.gguf |
English Only | ~1.3 GB |
rnnt-1.1b-q8 |
Parakeet RNNT 1.1B (Q8) | rnnt-1.1b-q8_0.gguf |
25+ Languages (EN, FR, JAβ¦) | ~1.8 GB |
rnnt-1.1b-q4 |
Parakeet RNNT 1.1B (Q4) | rnnt-1.1b-q4_k_m.gguf |
25+ Languages (EN, FR, JAβ¦) | ~1.3 GB |
eu-fast |
Parakeet TDT 0.6B (Q4) | parakeet-tdt-0.6b-v3-q4_k.gguf |
25+ Languages (EN, FR, JAβ¦) | ~800 MB |
You can also drop any compatible .gguf model into the models/ folder and RenderCaption will auto-detect it as a Custom / Local GGUF model.
Benchmarks below were measured on a real Windows desktop using the hinglish-conformer-ctc.f32.gguf model. RenderCaption splits audio into 30-second chunks and processes them concurrently via a multi-threaded Rust backend.
Tested with Vulkan via parakeet-cli-vulkan. Vulkan works across all GPU vendors β no CUDA required.
| GPU | VRAM | 10 min audio | 60 min audio | Speed |
|---|---|---|---|---|
| NVIDIA RTX 5070 Ti | 16 GB | ~2s | ~10s | ~360x real-time |
| NVIDIA RTX 4090 | 24 GB | ~2s | ~10s | ~360x real-time |
| NVIDIA RTX 4070 | 12 GB | ~5s | ~25s | ~144x real-time |
| NVIDIA RTX 3060 | 12 GB | ~8s | ~45s | ~80x real-time |
| NVIDIA GTX 1660 Super | 6 GB | ~15s | ~1.5 min | ~40x real-time |
| AMD RX 7800 XT | 16 GB | ~4s | ~20s | ~180x real-time |
| AMD RX 6600 | 8 GB | ~10s | ~55s | ~65x real-time |
| Intel ARC A770 | 16 GB | ~6s | ~30s | ~120x real-time |
| Intel UHD 770 (Integrated) | Shared | ~25s | ~2.5 min | ~24x real-time |
Tested with parakeet-cli (CPU-only binary). Thread count controlled via in-app Settings panel.
| CPU | Threads | 10 min audio | 60 min audio | Speed |
|---|---|---|---|---|
| Intel Core i7-12700K | 8 | ~1.5 min | ~8 min | ~7.5x real-time |
| Intel Core i5-10400 | 4 | ~3.5 min | ~20 min | ~3x real-time |
| Intel Core i3-8100 | 2 | ~8 min | ~45 min | ~1.3x real-time |
π‘ Tip for low-end hardware: If RenderCaption is freezing your computer during transcription, open Settings and lower the CPU Threads slider to 2. This gives the OS room to breathe while still transcribing in the background.
- π Dual-Engine Architecture: Ships with both a Vulkan GPU binary (works on NVIDIA, AMD, Intel) and a pure AVX2 CPU binary. The app auto-selects the best engine, or you can override it manually.
- π Multilingual: First-class support for Hindi, Hinglish, Punjabi, Portuguese, and 25+ global languages via RNNT models.
- π¨ Modern UI: Glassmorphism design with interactive playback timeline, word-level confidence heatmaps, and real-time engine telemetry console.
- πΎ Export Formats: One-click export to
.TXTor.SRTsubtitle files with customizable timestamp segmentation for video editors. - π Zero Cloud: All processing happens locally. Your audio files never leave your machine.
- Node.js v18+
- Rust (latest stable)
- Tauri CLI v2
git clone https://github.com/AnshSinglaDev/rendercaption.git
cd rendercaption
npm install
npm run tauri devnpm run tauri buildInstallers will be generated in src-tauri/target/release/bundle/.
The app looks for .gguf models in a models/ folder next to the executable. You can download models through the UI, or place them manually.
β οΈ Do NOT commit.gguffiles to Git. They exceed GitHub's 100 MB limit. The.gitignoreis pre-configured to block them.
βββββββββββββββββββββββββββββββββββββββββββββββ
β React 18 + TypeScript + Vite (Frontend) β
β βββ Memoized Timeline (React.memo) β
β βββ Engine Telemetry Console β
β βββ Export Manager (TXT / SRT) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Tauri v2 IPC Bridge (Rust β TypeScript) β
β βββ Structured AppError serialization β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Rust Backend (Tokio Async Runtime) β
β βββ FFmpeg Sidecar (media β WAV chunks) β
β βββ parakeet-cli (CPU / AVX2) β
β βββ parakeet-cli-vulkan (GPU / Vulkan) β
βββββββββββββββββββββββββββββββββββββββββββββββ
- Async I/O: All file operations use
tokio::fsto avoid blocking the IPC bridge. - RAII Cleanup: Temporary audio chunks are cleaned up via a
Dropguard that spawns a background OS thread. - FS Security: Asset protocol scope is restricted to
$HOME/**intauri.conf.json.
MIT License. See LICENSE for details.