Turn an audio or video file into translated text with a simple browser upload. The app takes your media file, transcribes the speech with Whisper, sends the transcript to LibreTranslate, and shows the translated result on the page. Basically: upload file, pick language, let the machines do their dramatic little dance.
This project is a small full-stack translation app built with a Node.js/Express backend and a plain HTML, CSS, and JavaScript frontend.
It supports:
- Uploading audio and video files from the browser
- Transcribing spoken audio into text using Whisper
- Translating the transcript into another language using LibreTranslate
- Fetching available translation languages from a local LibreTranslate server
- Displaying the translated text directly in the web interface
- Handling long transcription jobs with a timeout so the backend does not sit there forever contemplating life
- Frontend: HTML, CSS, JavaScript
- Backend: Node.js, Express.js
- File Uploads: Multer
- API Requests: Axios
- Transcription: Whisper
- Translation: LibreTranslate
- Other Backend Utilities: CORS, File System, Child Process
Real_Time_Translation_app/
├── .github/
│ └── FUNDING.yml
├── backend/
│ └── server.js
├── frontend/
│ └── index.html
├── .gitignore
├── Readme.md
├── package.json
├── package-lock.json
└── whisper
backend/server.js- Express server that handles uploads, runs Whisper, calls LibreTranslate, and returns the translated text.frontend/index.html- Browser interface for uploading files, searching/selecting a language, and viewing results.package.json- Node.js dependencies for the backend..gitignore- Ignores generated/local folders such asnode_modules,uploads, andTL-backend.
- The user opens the web app in a browser.
- The frontend loads available languages from the backend.
- The user uploads an audio or video file.
- The backend saves the file in the uploads folder.
- Whisper transcribes the uploaded media into a
.txtfile. - The backend reads the transcript.
- The transcript is sent to LibreTranslate.
- The translated text is returned to the frontend.
- The user sees the final translated text on screen.
Clean enough. Slightly magical. Still mostly JavaScript.
Before running the project, make sure you have these installed:
- Node.js and npm
- Python 3.11.x recommended
- Whisper installed and available from the command line
- LibreTranslate running locally
- Internet connection for installing dependencies
Clone the repository:
git clone https://github.com/ShreeGopi/Real_Time_Translation_app.git
cd Real_Time_Translation_appInstall Node.js dependencies:
npm installInstall Whisper-related Python dependencies:
pip install torch==2.0.1 numpy==1.24.3 whisperIf the whisper package above does not work correctly, install Whisper directly from GitHub:
pip install git+https://github.com/openai/whisper.gitThis app expects LibreTranslate to run at:
http://127.0.0.1:5000
Create a local folder for LibreTranslate:
mkdir TL-backend
cd TL-backendClone LibreTranslate:
git clone https://github.com/LibreTranslate/LibreTranslate.git
cd LibreTranslateInstall helper tools:
pip install hatch virtualenvCreate and activate a virtual environment:
virtualenv libretranslate-env
libretranslate-env\Scripts\activateInstall LibreTranslate inside the environment:
hatch run pip install .Start the LibreTranslate server:
hatch run libretranslateKeep this terminal running. LibreTranslate is the translation engine, so if this terminal is closed, translation will also go on a coffee break.
Open a new terminal from the project root and start the Express backend:
cd backend
node server.jsThen open this URL in your browser:
http://localhost:3000
You should now see the upload page.
- Choose an audio or video file.
- Search for a target language.
- Select the language from the dropdown.
- Click Upload and Translate.
- Wait while Whisper transcribes the file and LibreTranslate translates it.
- Read the translated text on the page.
Supported files depend on what Whisper can process, but common formats like .mp3, .wav, and .mp4 are good starting points.
Uploads an audio/video file, transcribes it, translates the transcript, and returns the translated text.
Expected form data:
file- audio or video filelanguage- target language code, such asfr,es, orde
Example response:
{
"message": "Transcription and translation completed",
"translatedText": "Translated text appears here"
}Fetches supported languages from the local LibreTranslate server.
The backend includes basic handling for:
- Missing file uploads
- Whisper transcription failures
- Translation API failures
- Long transcription jobs that exceed the 2-minute timeout
If something fails, check both terminals:
- Express backend terminal
- LibreTranslate terminal
The answer is usually hiding there, pretending to be a stack trace.
- Whisper must be installed correctly and available as a command-line tool.
- LibreTranslate must be running locally before translation will work.
- Uploaded files and generated transcription files are stored in
backend/uploads. TL-backendanduploadsare ignored by Git because they are local/generated folders.- Python 3.11.x is recommended for smoother compatibility.
Some useful next steps for this project:
- Add translated audio output
- Support live audio/video streaming translation
- Allow multiple files to be uploaded at once
- Add progress indicators for long transcription jobs
- Improve frontend styling and mobile layout
- Add stronger file validation and upload limits
- Add tests for backend routes
- Add deployment instructions
This project is open source. The current package metadata uses the ISC license.
This project is a good example of connecting frontend file uploads, backend processing, AI-powered transcription, and translation APIs into one working flow. It is small, practical, and very resume-friendly, which is always a nice bonus.