Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Andrew env
.DS_Store
.vscode
debug_results.txt

# Andrew functional adds
/tracks/
Expand Down
34 changes: 34 additions & 0 deletions Dockerfile.rocm
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Use ROCm base image with Python
FROM rocm/dev-ubuntu-22.04:7.2-complete

# Set the working directory in the container
WORKDIR /workdir

# Install necessary packages
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
python3 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*

RUN python3 -m pip install --upgrade pip

# Install PyTorch with ROCm support
RUN --mount=type=cache,target=/root/.cache \
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2

# Install audio-separator with GPU support
RUN --mount=type=cache,target=/root/.cache \
pip3 install "audio-separator[gpu]" onnxruntime-rocm

# Default environment variables for AMD RX 6600 series (gfx1032)
# Override these for other GPUs:
# RX 7900: HSA_OVERRIDE_GFX_VERSION=11.0.0, PYTORCH_ROCM_ARCH=gfx1100
# MI250X: HSA_OVERRIDE_GFX_VERSION=9.4.2, PYTORCH_ROCM_ARCH=gfx90a
ARG HSA_OVERRIDE_GFX_VERSION=10.3.2
ARG PYTORCH_ROCM_ARCH=gfx1030
ENV HSA_OVERRIDE_GFX_VERSION=${HSA_OVERRIDE_GFX_VERSION}
ENV PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}

# Run audio-separator when the container launches
ENTRYPOINT ["audio-separator"]
88 changes: 85 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ The simplest (and probably most used) use case for this package is to separate a
- [Installation 🛠️](#installation-%EF%B8%8F)
- [🐳 Docker](#-docker)
- [🎮 Nvidia GPU with CUDA or 🧪 Google Colab](#-nvidia-gpu-with-cuda-or--google-colab)
- [🖥️ AMD GPU with ROCm (Linux)](#-amd-gpu-with-rocm-linux)
- [ Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)](#-apple-silicon-macos-sonoma-with-m1-or-newer-cpu-coreml-acceleration)
- [🐢 No hardware acceleration, CPU only](#-no-hardware-acceleration-cpu-only)
- [🎥 FFmpeg dependency](#-ffmpeg-dependency)
- [GPU / CUDA specific installation steps with Pip](#gpu--cuda-specific-installation-steps-with-pip)
- [Multiple CUDA library versions may be needed](#multiple-cuda-library-versions-may-be-needed)
- [ROCm specific troubleshooting](#rocm-specific-troubleshooting)
- [Usage 🚀](#usage-)
- [Command Line Interface (CLI)](#command-line-interface-cli)
- [Listing and Filtering Available Models](#listing-and-filtering-available-models)
Expand Down Expand Up @@ -67,6 +69,7 @@ The simplest (and probably most used) use case for this package is to separate a
- Ability to inference using a pre-trained model in PTH or ONNX format.
- CLI support for easy use in scripts and batch processing.
- Python API for integration into other projects.
- **Multi-platform GPU acceleration**: NVIDIA CUDA, AMD ROCm, Apple Silicon MPS/CoreML, DirectML, and CPU fallback.

## Installation 🛠️

Expand Down Expand Up @@ -112,6 +115,50 @@ Docker:
beveradb/audio-separator:gpu
```

### 🖥️ AMD GPU with ROCm (Linux)

**Supported ROCm Versions:** 5.7+

💬 If successfully configured, you should see this log message when running `audio-separator --env_info`:
`ONNXruntime has ROCMExecutionProvider available, enabling acceleration`

Pip (complete installation):
```sh
# First install PyTorch with ROCm support (Change ROCm version as needed.)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2

# Then install audio-separator with ROCm support
pip install "audio-separator[rocm]"
```

**Important:** You must install PyTorch with ROCm support BEFORE installing audio-separator. If you already have PyTorch with CUDA support installed, uninstall it first:
```sh
pip uninstall torch torchvision torchaudio
pip cache purge
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
pip install "audio-separator[rocm]"
```

**Required ROCm Packages:**
- PyTorch ROCm: `torch`, `torchvision`, `torchaudio` with ROCm support
- ONNX Runtime: `onnxruntime`, `onnxruntime-rocm`

**Basic ROCm Setup:**
- For AMD Radeon RX 6600 series (gfx1032), set environment variables:
```sh
export HSA_OVERRIDE_GFX_VERSION=10.3.2
export PYTORCH_ROCM_ARCH=gfx1030
```
- ROCm acceleration uses the CUDAExecutionProvider (ONNX Runtime maps ROCm to CUDA for compatibility)
- The system detects ROCm packages and PyTorch ROCm support automatically
- ROCm libraries must be properly installed on your system for acceleration to work

Docker (build from source):
```sh
docker build -f Dockerfile.rocm -t audio-separator:rocm .
docker run -it --device=/dev/kfd --device=/dev/dri --group-add=video -v `pwd`:/workdir audio-separator:rocm input.wav
```

###  Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)

💬 If successfully configured, you should see this log message when running `audio-separator --env_info`:
Expand Down Expand Up @@ -157,19 +204,26 @@ apt-get update; apt-get install -y ffmpeg
brew update; brew install ffmpeg
```

## GPU / CUDA specific installation steps with Pip
## GPU / CUDA specific installation steps with Pip (CUDA and ROCm)

In theory, all you should need to do to get `audio-separator` working with a GPU is install it with the `[gpu]` extra as above.
In theory, all you should need to do to get `audio-separator` working with a GPU is install it with the appropriate extra (`[gpu]` for CUDA/NVIDIA or `[rocm]` for ROCm/AMD) as above.

However, sometimes getting both PyTorch and ONNX Runtime working with CUDA support can be a bit tricky so it may not work that easily.
However, sometimes getting both PyTorch and ONNX Runtime working with GPU support can be a bit tricky so it may not work that easily.

You may need to reinstall both packages directly, allowing pip to calculate the right versions for your platform, for example:

**For CUDA/NVIDIA (`[gpu]`):**
- `pip uninstall torch onnxruntime`
- `pip cache purge`
- `pip install --force-reinstall torch torchvision torchaudio`
- `pip install --force-reinstall onnxruntime-gpu`

**For ROCm/AMD (`[rocm]`):**
- `pip uninstall torch onnxruntime onnxruntime-rocm`
- `pip cache purge`
- `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2`
- `pip install --force-reinstall onnxruntime-rocm`

I generally recommend installing the latest version of PyTorch for your environment using the command recommended by the wizard here:
<https://pytorch.org/get-started/locally/>

Expand Down Expand Up @@ -197,6 +251,34 @@ You can resolve this by running the following command:
python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
```

### ROCm specific troubleshooting

For ROCm (AMD GPU) support, make sure you have:
1. ROCm installed on your system (typically version 5.7+)
2. PyTorch with ROCm support installed (check PyTorch website for ROCm installation)
3. `onnxruntime-rocm` package installed

If you encounter issues with ROCm detection, try reinstalling the packages:
```sh
pip uninstall torch onnxruntime
pip cache purge
# Install PyTorch with ROCm support (check https://pytorch.org for the correct command)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
pip install onnxruntime-rocm
```

**ROCm Performance Optimization:**
- The ROCm execution provider includes performance optimizations for AMD GPUs:
- Parallel execution mode for better multi-core utilization
- Kernel tuning enabled for optimal performance
- Memory pattern optimization for better cache usage
- Smart memory allocation strategy

**Common ROCm Issues:**
- If you see ROCm package installed but no acceleration: Make sure `onnxruntime-rocm` is installed and ROCm libraries are in your PATH
- If PyTorch shows CUDA but not ROCm: Reinstall PyTorch with ROCm support using the PyTorch ROCm index URL
- Docker issues: Use the provided `Dockerfile.rocm` and ensure proper device mounting

> Note: if anyone knows how to make this cleaner so we can support both different platform-specific dependencies for hardware acceleration without a separate installation process for each, please let me know or raise a PR!

## Usage 🚀
Expand Down
63 changes: 47 additions & 16 deletions audio_separator/separator/architectures/demucs_separator.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,25 @@
import torch
import numpy as np
from audio_separator.separator.common_separator import CommonSeparator
from audio_separator.separator.uvr_lib_v5.demucs.apply import apply_model, demucs_segments
from audio_separator.separator.uvr_lib_v5.demucs.apply import (
apply_model,
demucs_segments,
)
from audio_separator.separator.uvr_lib_v5.demucs.hdemucs import HDemucs
from audio_separator.separator.uvr_lib_v5.demucs.pretrained import get_model as get_demucs_model
from audio_separator.separator.uvr_lib_v5.demucs.pretrained import (
get_model as get_demucs_model,
)
from audio_separator.separator.uvr_lib_v5 import spec_utils

DEMUCS_4_SOURCE = ["drums", "bass", "other", "vocals"]

DEMUCS_2_SOURCE_MAPPER = {CommonSeparator.INST_STEM: 0, CommonSeparator.VOCAL_STEM: 1}
DEMUCS_4_SOURCE_MAPPER = {CommonSeparator.BASS_STEM: 0, CommonSeparator.DRUM_STEM: 1, CommonSeparator.OTHER_STEM: 2, CommonSeparator.VOCAL_STEM: 3}
DEMUCS_4_SOURCE_MAPPER = {
CommonSeparator.BASS_STEM: 0,
CommonSeparator.DRUM_STEM: 1,
CommonSeparator.OTHER_STEM: 2,
CommonSeparator.VOCAL_STEM: 3,
}
DEMUCS_6_SOURCE_MAPPER = {
CommonSeparator.BASS_STEM: 0,
CommonSeparator.DRUM_STEM: 1,
Expand Down Expand Up @@ -64,8 +74,12 @@ def __init__(self, common_config, arch_config):
# Enables "Segments". Deselecting this option is only recommended for those with powerful PCs.
self.segments_enabled = arch_config.get("segments_enabled", True)

self.logger.debug(f"Demucs arch params: segment_size={self.segment_size}, segments_enabled={self.segments_enabled}")
self.logger.debug(f"Demucs arch params: shifts={self.shifts}, overlap={self.overlap}")
self.logger.debug(
f"Demucs arch params: segment_size={self.segment_size}, segments_enabled={self.segments_enabled}"
)
self.logger.debug(
f"Demucs arch params: shifts={self.shifts}, overlap={self.overlap}"
)

self.demucs_source_map = DEMUCS_4_SOURCE_MAPPER

Expand Down Expand Up @@ -107,15 +121,23 @@ def separate(self, audio_file_path, custom_output_names=None):

self.logger.debug("Loading model for demixing...")

self.demucs_model_instance = HDemucs(sources=DEMUCS_4_SOURCE)
self.demucs_model_instance = get_demucs_model(name=os.path.splitext(os.path.basename(self.model_path))[0], repo=Path(os.path.dirname(self.model_path)))
self.demucs_model_instance = demucs_segments(self.segment_size, self.demucs_model_instance)
self.demucs_model_instance.to(self.torch_device)
# Use GPU device for Demucs if available and not explicitly disabled
inference_device = self.torch_device

# Load the ROCm-compatible Demucs model
self.demucs_model_instance = get_demucs_model(
name=os.path.splitext(os.path.basename(self.model_path))[0],
repo=Path(os.path.dirname(self.model_path)),
)
self.demucs_model_instance = demucs_segments(
self.segment_size, self.demucs_model_instance
)
self.demucs_model_instance.to(inference_device)
self.demucs_model_instance.eval()

self.logger.debug("Model loaded and set to evaluation mode.")

source = self.demix_demucs(mix)
source = self.demix_demucs(mix, inference_device)

del self.demucs_model_instance
self.clear_gpu_cache()
Expand All @@ -126,13 +148,20 @@ def separate(self, audio_file_path, custom_output_names=None):

if isinstance(inst_source, np.ndarray):
self.logger.debug("Processing instance source...")
source_reshape = spec_utils.reshape_sources(inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]], source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]])
inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]] = source_reshape
source_reshape = spec_utils.reshape_sources(
inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]],
source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]],
)
inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]] = (
source_reshape
)
source = inst_source

if isinstance(source, np.ndarray):
source_length = len(source)
self.logger.debug(f"Processing source array, source length is {source_length}")
self.logger.debug(
f"Processing source array, source length is {source_length}"
)
match source_length:
case 2:
self.logger.debug("Setting source map to 2-stem...")
Expand All @@ -148,7 +177,9 @@ def separate(self, audio_file_path, custom_output_names=None):
for stem_name, stem_value in self.demucs_source_map.items():
if self.output_single_stem is not None:
if stem_name.lower() != self.output_single_stem.lower():
self.logger.debug(f"Skipping writing stem {stem_name} as output_single_stem is set to {self.output_single_stem}...")
self.logger.debug(
f"Skipping writing stem {stem_name} as output_single_stem is set to {self.output_single_stem}..."
)
continue

stem_path = self.get_stem_output_path(stem_name, custom_output_names)
Expand All @@ -159,7 +190,7 @@ def separate(self, audio_file_path, custom_output_names=None):

return output_files

def demix_demucs(self, mix):
def demix_demucs(self, mix, inference_device):
"""
Demixes the input mix using the demucs model.
"""
Expand All @@ -181,7 +212,7 @@ def demix_demucs(self, mix):
overlap=self.overlap,
static_shifts=1 if self.shifts == 0 else self.shifts,
set_progress_bar=None,
device=self.torch_device,
device=inference_device,
progress=True,
)[0]

Expand Down
Loading
Loading