The Voiceover Alchemist is an audio dubbing editor designed to quickly record voiceovers, remove voice from original video, and change voices via RVC post-processing.
- Audio Segments: Record and process audio segments for voiceover.
- Voice Model Integration: Apply different voice models to audio segments.
- Voice Removal: Remove voices from the underlying video.
- Video Export: Combine video and processed audio into a single output file.
- Real-Time Playback: Synchronize audio and video during playback.
- Python 3.10
- Pip 24.0
- FFmpeg
Install the following libraries using pip:
rvc- Currently, there is a requirement of python & pip versions, see requirementsnumpyrequestsPySide6pyaudioscipylibrosaonnxruntimesoundfilepython-dotenvaudio-separatorprotobuf==3.20.3torch==2.1.2torchvision
Additionally, ensure FFmpeg is installed and available in your system's PATH.
-
Clone the repository:
git clone <repository_url> cd multi-voice-dubbing-editor
-
Create a virtual environment (optional but recommended):
python3.10 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install pip 24.0:
python -m ensurepip python -m pip install pip==24.0
-
Initialize/Install models
python full_init.py
- Ensure your virtual environment is activated.
- Start the application:
python app.py
- Create or Open a Project: Use the toolbar to create a new project or load an existing one.
- Import Video: Import a video file to extract audio and begin editing.
- Choose Voice Models: Select voice models and process the audio. The RVC models are stored in the assets/models directory.
- Record Audio Segments: Record audio segments in synchronization with the video. The voice from the video will be removed and replaced with your voice or any RVC-based voice
- Preview and Export: Preview the combined audio and export the video.
Note: Some of the features when used at the first time might require downloading of additional models, please be patient and read the logs in the console to see what's going on!
Upon creating a project, the following structure is generated:
project_folder/
├── recordings/ # Recorded audio files
├── processed/ # Processed audio files
├── assets/models/ # Voice model files
├── project.json # Project metadata
├── original.wav # Original extracted audio
├── preview.wav # Combined preview audio
└── video.mp4 # Imported video file
- Fork and clone the repository.
- Implement your changes.
- Run tests to ensure functionality.
- Submit a pull request for review.
We welcome contributions!
- PySide6 for the GUI framework.
- FFmpeg for video and audio processing.
- RVC for voice conversion.
- audio-separator for vocal removal.
