Real-Time Video/Audio Translation App

Turn an audio or video file into translated text with a simple browser upload. The app takes your media file, transcribes the speech with Whisper, sends the transcript to LibreTranslate, and shows the translated result on the page. Basically: upload file, pick language, let the machines do their dramatic little dance.

What This Project Does

This project is a small full-stack translation app built with a Node.js/Express backend and a plain HTML, CSS, and JavaScript frontend.

It supports:

Uploading audio and video files from the browser
Transcribing spoken audio into text using Whisper
Translating the transcript into another language using LibreTranslate
Fetching available translation languages from a local LibreTranslate server
Displaying the translated text directly in the web interface
Handling long transcription jobs with a timeout so the backend does not sit there forever contemplating life

Tech Stack

Frontend: HTML, CSS, JavaScript
Backend: Node.js, Express.js
File Uploads: Multer
API Requests: Axios
Transcription: Whisper
Translation: LibreTranslate
Other Backend Utilities: CORS, File System, Child Process

Project Structure

Real_Time_Translation_app/
├── .github/
│   └── FUNDING.yml
├── backend/
│   └── server.js
├── frontend/
│   └── index.html
├── .gitignore
├── Readme.md
├── package.json
├── package-lock.json
└── whisper

Important Files

backend/server.js - Express server that handles uploads, runs Whisper, calls LibreTranslate, and returns the translated text.
frontend/index.html - Browser interface for uploading files, searching/selecting a language, and viewing results.
package.json - Node.js dependencies for the backend.
.gitignore - Ignores generated/local folders such as node_modules, uploads, and TL-backend.

How It Works

The user opens the web app in a browser.
The frontend loads available languages from the backend.
The user uploads an audio or video file.
The backend saves the file in the uploads folder.
Whisper transcribes the uploaded media into a .txt file.
The backend reads the transcript.
The transcript is sent to LibreTranslate.
The translated text is returned to the frontend.
The user sees the final translated text on screen.

Clean enough. Slightly magical. Still mostly JavaScript.

Requirements

Before running the project, make sure you have these installed:

Node.js and npm
Python 3.11.x recommended
Whisper installed and available from the command line
LibreTranslate running locally
Internet connection for installing dependencies

Installation

Clone the repository:

git clone https://github.com/ShreeGopi/Real_Time_Translation_app.git
cd Real_Time_Translation_app

Install Node.js dependencies:

npm install

Install Whisper-related Python dependencies:

pip install torch==2.0.1 numpy==1.24.3 whisper

If the whisper package above does not work correctly, install Whisper directly from GitHub:

pip install git+https://github.com/openai/whisper.git

Setting Up LibreTranslate Locally

This app expects LibreTranslate to run at:

http://127.0.0.1:5000

Create a local folder for LibreTranslate:

mkdir TL-backend
cd TL-backend

Clone LibreTranslate:

git clone https://github.com/LibreTranslate/LibreTranslate.git
cd LibreTranslate

Install helper tools:

pip install hatch virtualenv

Create and activate a virtual environment:

virtualenv libretranslate-env
libretranslate-env\Scripts\activate

Install LibreTranslate inside the environment:

hatch run pip install .

Start the LibreTranslate server:

hatch run libretranslate

Keep this terminal running. LibreTranslate is the translation engine, so if this terminal is closed, translation will also go on a coffee break.

Running the App

Open a new terminal from the project root and start the Express backend:

cd backend
node server.js

Then open this URL in your browser:

http://localhost:3000

You should now see the upload page.

Using the App

Choose an audio or video file.
Search for a target language.
Select the language from the dropdown.
Click Upload and Translate.
Wait while Whisper transcribes the file and LibreTranslate translates it.
Read the translated text on the page.

Supported files depend on what Whisper can process, but common formats like .mp3, .wav, and .mp4 are good starting points.

API Endpoints

`POST /upload`

Uploads an audio/video file, transcribes it, translates the transcript, and returns the translated text.

Expected form data:

file - audio or video file
language - target language code, such as fr, es, or de

Example response:

{
  "message": "Transcription and translation completed",
  "translatedText": "Translated text appears here"
}

`GET /languages`

Fetches supported languages from the local LibreTranslate server.

Error Handling

The backend includes basic handling for:

Missing file uploads
Whisper transcription failures
Translation API failures
Long transcription jobs that exceed the 2-minute timeout

If something fails, check both terminals:

Express backend terminal
LibreTranslate terminal

The answer is usually hiding there, pretending to be a stack trace.

Notes

Whisper must be installed correctly and available as a command-line tool.
LibreTranslate must be running locally before translation will work.
Uploaded files and generated transcription files are stored in backend/uploads.
TL-backend and uploads are ignored by Git because they are local/generated folders.
Python 3.11.x is recommended for smoother compatibility.

Future Improvements

Some useful next steps for this project:

Add translated audio output
Support live audio/video streaming translation
Allow multiple files to be uploaded at once
Add progress indicators for long transcription jobs
Improve frontend styling and mobile layout
Add stronger file validation and upload limits
Add tests for backend routes
Add deployment instructions

License

This project is open source. The current package metadata uses the ISC license.

Final Thought

This project is a good example of connecting frontend file uploads, backend processing, AI-powered transcription, and translation APIs into one working flow. It is small, practical, and very resume-friendly, which is always a nice bonus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Video/Audio Translation App

What This Project Does

Tech Stack

Project Structure

Important Files

How It Works

Requirements

Installation

Setting Up LibreTranslate Locally

Running the App

Using the App

API Endpoints

`POST /upload`

`GET /languages`

Error Handling

Notes

Future Improvements

License

Final Thought

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
backend		backend
frontend		frontend
.gitattributes		.gitattributes
.gitignore		.gitignore
Readme.md		Readme.md
package-lock.json		package-lock.json
package.json		package.json
whisper		whisper

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Real-Time Video/Audio Translation App

What This Project Does

Tech Stack

Project Structure

Important Files

How It Works

Requirements

Installation

Setting Up LibreTranslate Locally

Running the App

Using the App

API Endpoints

POST /upload

GET /languages

Error Handling

Notes

Future Improvements

License

Final Thought

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /upload`

`GET /languages`

Packages