A complete pipeline to generate subtitles with Whisper (+100 languages), translate them with NLLB-200 (+200 languages), and burn specified subtitles into input videos.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
This project automates the entire workflow of generating burned-in subtitles for videos, supporting translation between any of the +200 languages supported by NLLB and transcription of any of the +100 languages supported by Whisper. It was designed for content creators, translators, or anyone who needs to repurpose videos for a global audience.
Why this project exists?
- Whisper is great for transcription, but needs post-processing for proper subtitle formatting (.srt).
- NLLB-200 provides high-quality translation among +200 languages.
- Hardcoded (burned-in) subtitles are part of the video image itself, so they work on any device or player (TV, smartphone, projector, etc.) without needing separate .srt files or player support.
- Human-in-the-loop – Automatically generated .srt files can be edited by the user at any stage, allowing manual correction of transcription or translation errors before burning them into the final video.
What makes it special?
- Auto‑installer – Checks and installs Python 3.8+, FFmpeg, PyTorch (among other dependencies), and automatically downloads AI models.
- Fully dynamic configuration – Everything is configurable through interactive menus:
- Translation models – Install and switch between NLLB sizes (600M, 1.3B, 3.3B) on the fly.
- Whisper models – Choose between tiny, base, small, medium, large, or turbo based on your VRAM.
- Input/Output languages – Select from +100 languages for transcription (Whisper) and +200 languages for translation (NLLB).
- Subtitling task – Burn either the original transcription OR the translated version.
- Subtitle formatting – Adjust maximum subtitle duration and lines per subtitle.
- Human-in-the-loop workflow – Since .srt files are plain text, users can manually correct any transcription or translation errors using any text editor before burning them into the video. This ensures final subtitle quality even when automatic generation makes mistakes.
- Modular design – Each step (download, transcription, translation, burning, cleanup) runs independently or in sequence.
- Interactive menu – Execute single steps (e.g., 2) or a range (e.g., 2,4 to run steps 2,3,4).
- Smart SRT generation – Respects a maximum subtitle duration, and lines per subtitle parameters.
- Video preprocessing tools – Cut videos by time, concatenate multiple videos, and automatically avoid file overwrites with unique naming.
- Cross‑platform GPU support – Works with NVIDIA (CUDA) and AMD (DirectML), with automatic fallback to CPU.
- Persistent configuration – All settings are saved in cache/settings.pkl and persist between sessions.
- Windows 10/11 (the auto‑installer is Windows‑only; other OS may work with manual setup)
- Python 3.8+ Required for compatibility with modern ML libraries (Transformers, CTranslate2, etc.). Your existing Python version will be verified by installer.py.
- Administrator privileges (required for the installer to set up FFmpeg and Chocolatey)
- Internet connection (to download models and dependencies)
- Clone the repository
git clone https://github.com/your_username/Remi-Subtitle-Forge.git
cd Remi-Subtitle-Forge- Run the installer
python installer.pyThe installer will:
- Request administrator privileges
- Install FFmpeg via winget
- Install all Python dependencies (whisper‑timestamped, torch, ctranslate2, yt‑dlp, etc.)
- Detect your GPU (NVIDIA, AMD, or CPU) and install the correct PyTorch backend
- Download the NLLB‑200 model (converted to CTranslate2) and prepare Whisper model (small) for download just before first use
f you are not using Windows, or if the automatic installer fails, you can install everything manually.
All dependencies and detailed instructions are documented in:
This file includes:
- Python version requirements (3.8+)
- List of all required Python packages with installation commands
- System dependencies (FFmpeg, etc.) for Linux and macOS
- GPU backend setup (CUDA for NVIDIA, DirectML for AMD, or CPU fallback)
- Model download instructions (Whisper and NLLB-200)
After installation, run the main menu:
python main.pyYou will see an interactive terminal menu:
--------------------------------------------------------------------------
Terminal Menu - Subtitle generator
--------------------------------------------------------------------------
1. Download video from YouTube.
2. Create video transcription (.str file) in [current input language]
3. Translate transcription (.str file) to [current output language]
4. Burn translated subtitles into de original video
5. Delete specified subtitles (.srt files) and input video
6. More options...
0. close the program.
--------------------------------------------------------------------------
(It is possible to run a range of steps. For example: 2,4 will run steps: 2,3,4)All generated files are stored in dedicated folders:
- input_videos/ – input videos
- transcripted_subtitles/ – generated transcription .srt
- translated_subtitles/ – translated transcription .srt
- output_videos/ – final videos with burned‑in subtitles
---------------------------------------------------
More Options - Menu
---------------------------------------------------
1. Manage languages (input and output)
2. Manage AI translation models
3. Change whisper parameters
4. Change subtitulation task
5. Update computer information in cache
6. Other tools (basic video editing)
0. Go back to main menu
---------------------------------------------------Options Description:
| Command | What it does |
|---|---|
1 |
Manage languages Change the input language - (for Whisper transcription) and output language (for NLLB translation). |
2 |
Manage AI translation models - Install, switch between, or delete NLLB translation models (600M, 1.3B, 3.3B). |
3 |
Change Whisper parameters - Adjust the Whisper model size (tiny to turbo), maximum subtitle duration (seconds), and lines per subtitle. |
4 |
Change subtitulation task - Choose which subtitles get burned into the final video: transcribed (original language) or translated (target language). |
5 |
Update computer information in cache - Manually select which GPU the AI should use (if multiple GPUs are available). |
6 |
Other tools - Allow the user to cut videos by time or concatenate multiple videos before processing. |
- Core pipeline (download, transcribe, translate, burn, cleanup)
- Auto‑installer for dependencies and models
- Interactive menu with range execution
- Smart SRT splitting (time‑aware)
- Multi‑language support (target other languages)
- Add a simple GUI (optional)
- Publish to PyPI
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
If you find this tool useful, consider supporting its development:
Distributed under the Unlicense License. See LICENSE.txt for more information.
Daniel Rojas - hdrojas.sanin@gmail.com - LinkedIn
Project Link: https://github.com/TheRealHe/Remi-Subtitle-Forge