Add MIGraphX execution mode for AMD GPUs#3
Open
maherr wants to merge 3 commits intoavencera:masterfrom
Open
Conversation
The `[target.'cfg(target_arch = "x86_64")'.dependencies]` and `[target.'cfg(not(target_arch = "x86_64"))'.dependencies]` tables were declared between `ndarray-npy` and `ort`, so everything after them in the `[dependencies]` section (`ort`, `libloading`, `tracing`, `thiserror`, `crossbeam-channel`, `rayon`) was silently re-scoped into the second target-specific table. On x86_64 those deps became unreachable and `cargo check` failed with "unresolved module or unlinked crate `ort`". Moving the two target tables below the untargeted deps restores the intended scoping without changing which backend is selected on either arch.
Adds an `ExecutionMode::MiGraphX` variant gated behind a new `migraphx` Cargo feature, forwarding to ONNX Runtime's MIGraphX execution provider. Users on AMD GPUs can now select an ORT-accelerated path without touching the existing CUDA or CoreML code paths. * New `migraphx = ["ort/migraphx"]` feature. * `ExecutionMode::MiGraphX` variant plus an `is_migraphx()` helper that mirrors `is_cuda()` / `is_coreml()`. * `validate()` returns the same feature-gated error pattern used by `coreml` and `cuda` when the feature is off. * `with_execution_mode()` attaches the MIGraphX execution provider with device 0 and `SameAsRequested` arena growth. Users who need compiled graph caching can set the ORT-standard env vars (`ORT_MIGRAPHX_LOAD_COMPILED_MODEL`, `ORT_MIGRAPHX_SAVE_COMPILED_MODEL`, `ORT_MIGRAPHX_SAVE_COMPILE_PATH`); no programmatic cache configuration is added here. * `required_files(MiGraphX)` reuses the CPU file set since MIGraphX loads stock ONNX models directly (no split-backend assets). * `segmentation_step_seconds(MiGraphX)` mirrors the CUDA step. * Added `migraphx_mode_requires_feature` unit test following the existing `coreml_modes_require_feature` / `cuda_modes_require_feature` pattern. Verified against VoxConverse (232 files) on an RX 9070 (RDNA 4, gfx1201) using a patched onnxruntime build: 10.65% strict DER at 15.47x realtime. Background and patch set for the RDNA 4 ORT+MIGraphX stack: https://maherr.dev/rdna4-missing-rung/
The inference_path() selector previously routed MIGraphX to Sequential, so segmentation ran to completion before embedding began. The existing run_concurrent_inference machinery (streaming segmentation windows over a bounded crossbeam channel into ConcurrentEmbeddingRunner::run_masked) already handles the MIGraphX EmbeddingPath::Masked case with no MIGraphX-specific gaps. This one-line change adds MiGraphX to the same match arm used for CoreML and CUDA. Measured on an RX 9070 (gfx1201) with the MIGraphX provider built against ORT 1.24.2: - 3-min call, speakrs alone: 12.34s -> 9.1s (-26%) - 20-min VoxConverse file, speakrs alone: 61.87s -> 44.06s (-28.8%) - Gain scales with audio length: segmentation fully overlaps with embedding, so the CPU-side segmentation prelude is absorbed. - Inside a parallel Whisper + speakrs wrapper the end-to-end saving is smaller (~9% on 3-min) because GPU contention on the shared device partially offsets the overlap, but it remains positive. - Segment counts and batching are unchanged (10x32 + 1x11 on the 3-min file, before and after). This is a scheduling change, not a modeling change.
maherr
added a commit
to maherr/maherr.github.io
that referenced
this pull request
Apr 21, 2026
TEST set: 15.47x -> 20.37x realtime, 10.65%/7.85%/6.90% -> 10.76%/7.96%/7.01% DER across the three conventions. Reflects the streaming-segmentation optimization that shipped 2026-04-18 (concurrent seg+emb dispatch in speakrs; details at ROCm/AMDMIGraphX#4792 and avencera/speakrs#3). DEV numbers unchanged (13.63x chart, 6.84% strict remain the pre-streaming baseline). Also updates the per-dollar speed margin from 22% to 60% to reflect the new 20.37x/$550 figure and adjusts the strict-DER margin vs pyannote 3.1 from -0.65pp to -0.54pp accordingly.
Author
|
Bumping in case this slipped past. The change is gated behind the new |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an
ExecutionMode::MiGraphXvariant gated behind a newmigraphxCargo feature. The new path forwards to ONNX Runtime's MIGraphX execution provider so users on AMD GPUs can get an ORT-accelerated path without touching CUDA or CoreML code.This is purely additive: existing modes, file layouts, and feature sets are unchanged.
What changed
Cargo.tomlmigraphx = ["ort/migraphx"]feature.[target.'cfg(...)'.dependencies]tables forndarray-linalg-defaultbelow the core[dependencies]block. In the current ordering,ort,libloading,tracing,thiserror,crossbeam-channel, andrayonall fall inside[target.'cfg(not(target_arch = "x86_64"))'.dependencies], which makescargo checkfail on x86_64 with unresolved module or unlinked crate'ort'. This is a pre-existing issue unrelated to the MIGraphX feature but the MIGraphX feature can't be verified without it, so it's bundled as a separate commit. Happy to split into a standalone PR if preferred.src/inference.rsExecutionMode::MiGraphXvariant plus anis_migraphx()helper that mirrorsis_cuda()/is_coreml().validate()returns the same feature-gated error pattern used bycoremlandcudawhen the feature is off.with_execution_mode()attaches the MIGraphX EP with device 0 andSameAsRequestedarena growth.migraphx_mode_requires_featureunit test following the existing pattern.src/models.rs/src/pipeline/config.rsrequired_files(MiGraphX)reuses the CPU file set since MIGraphX loads stock ONNX models directly (no split-backend assets needed).segmentation_step_seconds(MiGraphX)mirrors the CUDA step.src/pipeline.rsExecutionMode::MiGraphXadded to the concurrent inference path match arm so segmentation and embedding stream in parallel instead of running sequentially. Measured on an RX 9070:Scheduling-only win; segment counts and batching unchanged.
Notes
cargo check --all-features,cargo check --no-default-features,cargo check --no-default-features --features migraphxon x86_64 linux.Thanks for your time on this.