nomadkaraoke · keybangz · Mar 16, 2026 · Mar 16, 2026 · Mar 17, 2026 · Mar 17, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,7 @@
 # Andrew env
 .DS_Store
 .vscode
+debug_results.txt
 
 # Andrew functional adds
 /tracks/

diff --git a/Dockerfile.rocm b/Dockerfile.rocm
@@ -0,0 +1,34 @@
+# Use ROCm base image with Python
+FROM rocm/dev-ubuntu-22.04:7.2-complete
+
+# Set the working directory in the container
+WORKDIR /workdir
+
+# Install necessary packages
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    python3 \
+    python3-pip \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN python3 -m pip install --upgrade pip
+
+# Install PyTorch with ROCm support
+RUN --mount=type=cache,target=/root/.cache \
+    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2
+
+# Install audio-separator with GPU support
+RUN --mount=type=cache,target=/root/.cache \
+    pip3 install "audio-separator[gpu]" onnxruntime-rocm
+
+# Default environment variables for AMD RX 6600 series (gfx1032)
+# Override these for other GPUs:
+#   RX 7900: HSA_OVERRIDE_GFX_VERSION=11.0.0, PYTORCH_ROCM_ARCH=gfx1100
+#   MI250X:  HSA_OVERRIDE_GFX_VERSION=9.4.2, PYTORCH_ROCM_ARCH=gfx90a
+ARG HSA_OVERRIDE_GFX_VERSION=10.3.2
+ARG PYTORCH_ROCM_ARCH=gfx1030
+ENV HSA_OVERRIDE_GFX_VERSION=${HSA_OVERRIDE_GFX_VERSION}
+ENV PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}
+
+# Run audio-separator when the container launches
+ENTRYPOINT ["audio-separator"]
diff --git a/README.md b/README.md
@@ -25,11 +25,13 @@ The simplest (and probably most used) use case for this package is to separate a
   - [Installation 🛠️](#installation-%EF%B8%8F)
     - [🐳 Docker](#-docker)
     - [🎮 Nvidia GPU with CUDA or 🧪 Google Colab](#-nvidia-gpu-with-cuda-or--google-colab)
+    - [🖥️ AMD GPU with ROCm (Linux)](#-amd-gpu-with-rocm-linux)
     - [ Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)](#-apple-silicon-macos-sonoma-with-m1-or-newer-cpu-coreml-acceleration)
     - [🐢 No hardware acceleration, CPU only](#-no-hardware-acceleration-cpu-only)
     - [🎥 FFmpeg dependency](#-ffmpeg-dependency)
   - [GPU / CUDA specific installation steps with Pip](#gpu--cuda-specific-installation-steps-with-pip)
     - [Multiple CUDA library versions may be needed](#multiple-cuda-library-versions-may-be-needed)
+    - [ROCm specific troubleshooting](#rocm-specific-troubleshooting)
   - [Usage 🚀](#usage-)
     - [Command Line Interface (CLI)](#command-line-interface-cli)
     - [Listing and Filtering Available Models](#listing-and-filtering-available-models)
@@ -67,6 +69,7 @@ The simplest (and probably most used) use case for this package is to separate a
 - Ability to inference using a pre-trained model in PTH or ONNX format.
 - CLI support for easy use in scripts and batch processing.
 - Python API for integration into other projects.
+- **Multi-platform GPU acceleration**: NVIDIA CUDA, AMD ROCm, Apple Silicon MPS/CoreML, DirectML, and CPU fallback.
 
 ## Installation 🛠️
 
@@ -112,6 +115,50 @@ Docker:
 beveradb/audio-separator:gpu
 ```
 
+### 🖥️ AMD GPU with ROCm (Linux)
+
+**Supported ROCm Versions:** 5.7+
+
+💬 If successfully configured, you should see this log message when running `audio-separator --env_info`:
+ `ONNXruntime has ROCMExecutionProvider available, enabling acceleration`
+
+Pip (complete installation):
+```sh
+# First install PyTorch with ROCm support (Change ROCm version as needed.)
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2
+
+# Then install audio-separator with ROCm support
+pip install "audio-separator[rocm]"
+```
+
+**Important:** You must install PyTorch with ROCm support BEFORE installing audio-separator. If you already have PyTorch with CUDA support installed, uninstall it first:
+```sh
+pip uninstall torch torchvision torchaudio
+pip cache purge
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
+pip install "audio-separator[rocm]"
+```
+
+**Required ROCm Packages:**
+- PyTorch ROCm: `torch`, `torchvision`, `torchaudio` with ROCm support
+- ONNX Runtime: `onnxruntime`, `onnxruntime-rocm`
+
+**Basic ROCm Setup:**
+- For AMD Radeon RX 6600 series (gfx1032), set environment variables:
+```sh
+export HSA_OVERRIDE_GFX_VERSION=10.3.2
+export PYTORCH_ROCM_ARCH=gfx1030
+```
+- ROCm acceleration uses the CUDAExecutionProvider (ONNX Runtime maps ROCm to CUDA for compatibility)
+- The system detects ROCm packages and PyTorch ROCm support automatically
+- ROCm libraries must be properly installed on your system for acceleration to work
+
+Docker (build from source):
+```sh
+docker build -f Dockerfile.rocm -t audio-separator:rocm .
+docker run -it --device=/dev/kfd --device=/dev/dri --group-add=video -v `pwd`:/workdir audio-separator:rocm input.wav
+```
+
 ###  Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)
 
 💬 If successfully configured, you should see this log message when running `audio-separator --env_info`:
@@ -157,19 +204,26 @@ apt-get update; apt-get install -y ffmpeg
 brew update; brew install ffmpeg
 ```
 
-## GPU / CUDA specific installation steps with Pip
+## GPU / CUDA specific installation steps with Pip (CUDA and ROCm)
 
-In theory, all you should need to do to get `audio-separator` working with a GPU is install it with the `[gpu]` extra as above.
+In theory, all you should need to do to get `audio-separator` working with a GPU is install it with the appropriate extra (`[gpu]` for CUDA/NVIDIA or `[rocm]` for ROCm/AMD) as above.
 
-However, sometimes getting both PyTorch and ONNX Runtime working with CUDA support can be a bit tricky so it may not work that easily.
+However, sometimes getting both PyTorch and ONNX Runtime working with GPU support can be a bit tricky so it may not work that easily.
 
 You may need to reinstall both packages directly, allowing pip to calculate the right versions for your platform, for example:
 
+**For CUDA/NVIDIA (`[gpu]`):**
 - `pip uninstall torch onnxruntime`
 - `pip cache purge`
 - `pip install --force-reinstall torch torchvision torchaudio`
 - `pip install --force-reinstall onnxruntime-gpu`
 
+**For ROCm/AMD (`[rocm]`):**
+- `pip uninstall torch onnxruntime onnxruntime-rocm`
+- `pip cache purge`
+- `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2`
+- `pip install --force-reinstall onnxruntime-rocm`
+
 I generally recommend installing the latest version of PyTorch for your environment using the command recommended by the wizard here:
 <https://pytorch.org/get-started/locally/>
 
@@ -197,6 +251,34 @@ You can resolve this by running the following command:
 python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
 ```
 
+### ROCm specific troubleshooting
+
+For ROCm (AMD GPU) support, make sure you have:
+1. ROCm installed on your system (typically version 5.7+)
+2. PyTorch with ROCm support installed (check PyTorch website for ROCm installation)
+3. `onnxruntime-rocm` package installed
+
+If you encounter issues with ROCm detection, try reinstalling the packages:
+```sh
+pip uninstall torch onnxruntime
+pip cache purge
+# Install PyTorch with ROCm support (check https://pytorch.org for the correct command)
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
+pip install onnxruntime-rocm
+```
+
+**ROCm Performance Optimization:**
+- The ROCm execution provider includes performance optimizations for AMD GPUs:
+  - Parallel execution mode for better multi-core utilization
+  - Kernel tuning enabled for optimal performance
+  - Memory pattern optimization for better cache usage
+  - Smart memory allocation strategy
+
+**Common ROCm Issues:**
+- If you see ROCm package installed but no acceleration: Make sure `onnxruntime-rocm` is installed and ROCm libraries are in your PATH
+- If PyTorch shows CUDA but not ROCm: Reinstall PyTorch with ROCm support using the PyTorch ROCm index URL
+- Docker issues: Use the provided `Dockerfile.rocm` and ensure proper device mounting
+
 > Note: if anyone knows how to make this cleaner so we can support both different platform-specific dependencies for hardware acceleration without a separate installation process for each, please let me know or raise a PR!
 
 ## Usage 🚀

diff --git a/audio_separator/separator/architectures/demucs_separator.py b/audio_separator/separator/architectures/demucs_separator.py
@@ -4,15 +4,25 @@
 import torch
 import numpy as np
 from audio_separator.separator.common_separator import CommonSeparator
-from audio_separator.separator.uvr_lib_v5.demucs.apply import apply_model, demucs_segments
+from audio_separator.separator.uvr_lib_v5.demucs.apply import (
+    apply_model,
+    demucs_segments,
+)
 from audio_separator.separator.uvr_lib_v5.demucs.hdemucs import HDemucs
-from audio_separator.separator.uvr_lib_v5.demucs.pretrained import get_model as get_demucs_model
+from audio_separator.separator.uvr_lib_v5.demucs.pretrained import (
+    get_model as get_demucs_model,
+)
 from audio_separator.separator.uvr_lib_v5 import spec_utils
 
 DEMUCS_4_SOURCE = ["drums", "bass", "other", "vocals"]
 
 DEMUCS_2_SOURCE_MAPPER = {CommonSeparator.INST_STEM: 0, CommonSeparator.VOCAL_STEM: 1}
-DEMUCS_4_SOURCE_MAPPER = {CommonSeparator.BASS_STEM: 0, CommonSeparator.DRUM_STEM: 1, CommonSeparator.OTHER_STEM: 2, CommonSeparator.VOCAL_STEM: 3}
+DEMUCS_4_SOURCE_MAPPER = {
+    CommonSeparator.BASS_STEM: 0,
+    CommonSeparator.DRUM_STEM: 1,
+    CommonSeparator.OTHER_STEM: 2,
+    CommonSeparator.VOCAL_STEM: 3,
+}
 DEMUCS_6_SOURCE_MAPPER = {
     CommonSeparator.BASS_STEM: 0,
     CommonSeparator.DRUM_STEM: 1,
@@ -64,8 +74,12 @@ def __init__(self, common_config, arch_config):
         # Enables "Segments". Deselecting this option is only recommended for those with powerful PCs.
         self.segments_enabled = arch_config.get("segments_enabled", True)
 
-        self.logger.debug(f"Demucs arch params: segment_size={self.segment_size}, segments_enabled={self.segments_enabled}")
-        self.logger.debug(f"Demucs arch params: shifts={self.shifts}, overlap={self.overlap}")
+        self.logger.debug(
+            f"Demucs arch params: segment_size={self.segment_size}, segments_enabled={self.segments_enabled}"
+        )
+        self.logger.debug(
+            f"Demucs arch params: shifts={self.shifts}, overlap={self.overlap}"
+        )
 
         self.demucs_source_map = DEMUCS_4_SOURCE_MAPPER
 
@@ -107,15 +121,23 @@ def separate(self, audio_file_path, custom_output_names=None):
 
         self.logger.debug("Loading model for demixing...")
 
-        self.demucs_model_instance = HDemucs(sources=DEMUCS_4_SOURCE)
-        self.demucs_model_instance = get_demucs_model(name=os.path.splitext(os.path.basename(self.model_path))[0], repo=Path(os.path.dirname(self.model_path)))
-        self.demucs_model_instance = demucs_segments(self.segment_size, self.demucs_model_instance)
-        self.demucs_model_instance.to(self.torch_device)
+        # Use GPU device for Demucs if available and not explicitly disabled
+        inference_device = self.torch_device
+
+        # Load the ROCm-compatible Demucs model
+        self.demucs_model_instance = get_demucs_model(
+            name=os.path.splitext(os.path.basename(self.model_path))[0],
+            repo=Path(os.path.dirname(self.model_path)),
+        )
+        self.demucs_model_instance = demucs_segments(
+            self.segment_size, self.demucs_model_instance
+        )
+        self.demucs_model_instance.to(inference_device)
         self.demucs_model_instance.eval()
 
         self.logger.debug("Model loaded and set to evaluation mode.")
 
-        source = self.demix_demucs(mix)
+        source = self.demix_demucs(mix, inference_device)
 
         del self.demucs_model_instance
         self.clear_gpu_cache()
@@ -126,13 +148,20 @@ def separate(self, audio_file_path, custom_output_names=None):
 
         if isinstance(inst_source, np.ndarray):
             self.logger.debug("Processing instance source...")
-            source_reshape = spec_utils.reshape_sources(inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]], source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]])
-            inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]] = source_reshape
+            source_reshape = spec_utils.reshape_sources(
+                inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]],
+                source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]],
+            )
+            inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]] = (
+                source_reshape
+            )
             source = inst_source
 
         if isinstance(source, np.ndarray):
             source_length = len(source)
-            self.logger.debug(f"Processing source array, source length is {source_length}")
+            self.logger.debug(
+                f"Processing source array, source length is {source_length}"
+            )
             match source_length:
                 case 2:
                     self.logger.debug("Setting source map to 2-stem...")
@@ -148,7 +177,9 @@ def separate(self, audio_file_path, custom_output_names=None):
         for stem_name, stem_value in self.demucs_source_map.items():
             if self.output_single_stem is not None:
                 if stem_name.lower() != self.output_single_stem.lower():
-                    self.logger.debug(f"Skipping writing stem {stem_name} as output_single_stem is set to {self.output_single_stem}...")
+                    self.logger.debug(
+                        f"Skipping writing stem {stem_name} as output_single_stem is set to {self.output_single_stem}..."
+                    )
                     continue
 
             stem_path = self.get_stem_output_path(stem_name, custom_output_names)
@@ -159,7 +190,7 @@ def separate(self, audio_file_path, custom_output_names=None):
 
         return output_files
 
-    def demix_demucs(self, mix):
+    def demix_demucs(self, mix, inference_device):
         """
         Demixes the input mix using the demucs model.
         """
@@ -181,7 +212,7 @@ def demix_demucs(self, mix):
                 overlap=self.overlap,
                 static_shifts=1 if self.shifts == 0 else self.shifts,
                 set_progress_bar=None,
-                device=self.torch_device,
+                device=inference_device,
                 progress=True,
             )[0]