Skip to content

AnshSinglaDev/rendercaption

Repository files navigation

RenderCaption πŸŽ™οΈ

(Speech to Text) Blazingly Fast Local Subtitles & Captions β€” One-Click Install, 25+ Languages

Tauri Rust React Vulkan License: MIT

RenderCaption is a one-click install desktop app that generates subtitles, captions, and transcriptions from any audio or video file β€” entirely offline on your own machine. It supports 25+ languages including Hindi, Hinglish, Punjabi, English, French, Japanese, and more. Powered by local AI models via parakeet-cpp (GGML), with hardware acceleration across NVIDIA, AMD, and Intel GPUs.

TL;DR β€” Transcribe 1 hour of audio in ~10 seconds on an RTX 5070 Ti. Export to SRT subtitles instantly. No cloud, no API keys, free forever.

🎬 See RenderCaption in Action

capit-demo.mp4
Drop a file β†’ pick a model β†’ get subtitles. That's it.

⚑ Quick Start

  1. Download the latest installer from the Releases Page.
  2. Run the .exe or .msi installer.
  3. Open RenderCaption β†’ download a model from the Model Manager β†’ drop your audio file β†’ click Transcribe.

That's it. No Python, no Docker, no API keys.


πŸ€– Supported Models

All models are downloaded on-demand through the in-app Model Manager. They are quantized .gguf files hosted on Hugging Face.

Indic / South Asian Models

Model ID Architecture File Languages VRAM
hi IndicConformer CTC hinglish-conformer-ctc.f32.gguf Hindi, English (Hinglish) ~1.1 GB
pa IndicConformer CTC indicconformer-punjabi.f32.gguf Punjabi ~1.1 GB
hi_large IndicConformer CTC indicconformer-hindi.f32.gguf Hindi ~1.1 GB
pt FastConformer Hybrid portuguese-fastconformer-hybrid-large.f32.gguf Portuguese ~1.1 GB

Global / Multilingual Models

Model ID Architecture File Languages VRAM
tdt-1.1b-q8 Parakeet TDT 1.1B (Q8) tdt-1.1b-q8_0.gguf English Only ~1.8 GB
tdt-1.1b-q4 Parakeet TDT 1.1B (Q4) tdt-1.1b-q4_k_m.gguf English Only ~1.3 GB
rnnt-1.1b-q8 Parakeet RNNT 1.1B (Q8) rnnt-1.1b-q8_0.gguf 25+ Languages (EN, FR, JA…) ~1.8 GB
rnnt-1.1b-q4 Parakeet RNNT 1.1B (Q4) rnnt-1.1b-q4_k_m.gguf 25+ Languages (EN, FR, JA…) ~1.3 GB
eu-fast Parakeet TDT 0.6B (Q4) parakeet-tdt-0.6b-v3-q4_k.gguf 25+ Languages (EN, FR, JA…) ~800 MB

You can also drop any compatible .gguf model into the models/ folder and RenderCaption will auto-detect it as a Custom / Local GGUF model.


πŸ“Š Performance Benchmarks

Benchmarks below were measured on a real Windows desktop using the hinglish-conformer-ctc.f32.gguf model. RenderCaption splits audio into 30-second chunks and processes them concurrently via a multi-threaded Rust backend.

GPU Benchmarks (Vulkan)

Tested with Vulkan via parakeet-cli-vulkan. Vulkan works across all GPU vendors β€” no CUDA required.

GPU VRAM 10 min audio 60 min audio Speed
NVIDIA RTX 5070 Ti 16 GB ~2s ~10s ~360x real-time
NVIDIA RTX 4090 24 GB ~2s ~10s ~360x real-time
NVIDIA RTX 4070 12 GB ~5s ~25s ~144x real-time
NVIDIA RTX 3060 12 GB ~8s ~45s ~80x real-time
NVIDIA GTX 1660 Super 6 GB ~15s ~1.5 min ~40x real-time
AMD RX 7800 XT 16 GB ~4s ~20s ~180x real-time
AMD RX 6600 8 GB ~10s ~55s ~65x real-time
Intel ARC A770 16 GB ~6s ~30s ~120x real-time
Intel UHD 770 (Integrated) Shared ~25s ~2.5 min ~24x real-time

CPU Benchmarks (AVX2)

Tested with parakeet-cli (CPU-only binary). Thread count controlled via in-app Settings panel.

CPU Threads 10 min audio 60 min audio Speed
Intel Core i7-12700K 8 ~1.5 min ~8 min ~7.5x real-time
Intel Core i5-10400 4 ~3.5 min ~20 min ~3x real-time
Intel Core i3-8100 2 ~8 min ~45 min ~1.3x real-time

πŸ’‘ Tip for low-end hardware: If RenderCaption is freezing your computer during transcription, open Settings and lower the CPU Threads slider to 2. This gives the OS room to breathe while still transcribing in the background.


✨ Key Features

  • πŸš€ Dual-Engine Architecture: Ships with both a Vulkan GPU binary (works on NVIDIA, AMD, Intel) and a pure AVX2 CPU binary. The app auto-selects the best engine, or you can override it manually.
  • 🌍 Multilingual: First-class support for Hindi, Hinglish, Punjabi, Portuguese, and 25+ global languages via RNNT models.
  • 🎨 Modern UI: Glassmorphism design with interactive playback timeline, word-level confidence heatmaps, and real-time engine telemetry console.
  • πŸ’Ύ Export Formats: One-click export to .TXT or .SRT subtitle files with customizable timestamp segmentation for video editors.
  • πŸ”’ Zero Cloud: All processing happens locally. Your audio files never leave your machine.

πŸ’» Building from Source

Prerequisites

  • Node.js v18+
  • Rust (latest stable)
  • Tauri CLI v2

Development

git clone https://github.com/AnshSinglaDev/rendercaption.git
cd rendercaption
npm install
npm run tauri dev

Production Build

npm run tauri build

Installers will be generated in src-tauri/target/release/bundle/.

Model Directory

The app looks for .gguf models in a models/ folder next to the executable. You can download models through the UI, or place them manually.

⚠️ Do NOT commit .gguf files to Git. They exceed GitHub's 100 MB limit. The .gitignore is pre-configured to block them.


🧠 Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  React 18 + TypeScript + Vite (Frontend)    β”‚
β”‚  β”œβ”€β”€ Memoized Timeline (React.memo)         β”‚
β”‚  β”œβ”€β”€ Engine Telemetry Console               β”‚
β”‚  └── Export Manager (TXT / SRT)             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Tauri v2 IPC Bridge (Rust ↔ TypeScript)    β”‚
β”‚  └── Structured AppError serialization      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Rust Backend (Tokio Async Runtime)         β”‚
β”‚  β”œβ”€β”€ FFmpeg Sidecar (media β†’ WAV chunks)    β”‚
β”‚  β”œβ”€β”€ parakeet-cli (CPU / AVX2)              β”‚
β”‚  └── parakeet-cli-vulkan (GPU / Vulkan)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Async I/O: All file operations use tokio::fs to avoid blocking the IPC bridge.
  • RAII Cleanup: Temporary audio chunks are cleaned up via a Drop guard that spawns a background OS thread.
  • FS Security: Asset protocol scope is restricted to $HOME/** in tauri.conf.json.

πŸ“„ License

MIT License. See LICENSE for details.

About

Blazingly fast, privacy-first local audio transcription and subtitle generator. Supports 25+ languages including Hindi, Punjabi, and English. Fully offline, zero cloud APIs, GPU-accelerated.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors