Skip to content

Add MIGraphX support#2

Open
Looong01 wants to merge 1 commit intomasterfrom
migraphx
Open

Add MIGraphX support#2
Looong01 wants to merge 1 commit intomasterfrom
migraphx

Conversation

@Looong01
Copy link
Owner

This pull request adds support for the MIGraphX backend, enabling KataGo to run neural network inference on AMD GPUs using the MIGraphX library. The changes update build scripts, documentation, and configuration files to integrate MIGraphX as a selectable backend alongside existing ones like CUDA, ROCm, and OpenCL. Users can now build and configure KataGo to use MIGraphX on compatible AMD hardware.

Build system and backend integration:

  • Added MIGraphX as a selectable backend in cpp/CMakeLists.txt, including logic to find and link MIGraphX libraries, set up include paths, and define the USE_MIGRAPHX_BACKEND compile flag. [1] [2] [3]
  • Updated backend prefix handling and initialization logic in cpp/program/setup.cpp to recognize and initialize the MIGraphX backend. [1] [2]

Documentation updates:

  • Updated README.md and Compiling.md to document MIGraphX as a supported backend, describe its requirements, and explain its usage and performance characteristics. [1] [2] [3] [4] [5]

Configuration and user guidance:

  • Added MIGraphX-specific configuration options and documentation to all example config files, detailing device selection and FP16 usage for single and multi-GPU setups. [1] [2] [3] [4]

Runtime and program feedback:

  • Updated runtime version reporting in cpp/main.cpp to indicate when the MIGraphX backend is in use.
  • Modified GTP configuration logic to support MIGraphX device assignment for multi-GPU scenarios.

These changes collectively make KataGo compatible with the MIGraphX backend, expanding support for AMD GPUs and providing users with more flexibility in hardware and backend selection.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c511c338ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1124 to +1127
auto results = handle->model->prog.eval(params);

// Process outputs
assert(outputs.size() == (size_t)batchSize);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Populate inference buffers from MIGraphX eval outputs

In getOutput, the code calls prog.eval(params) and then immediately reads from policyResults, policyPassResults, valueResults, and scoreValueResults, but none of those buffers are ever filled from results. Since these vectors are initialized to zeros in createInputBuffers, every MIGraphX inference returns zero/default predictions regardless of the model, which breaks gameplay and analysis output for all runs using this backend.

Useful? React with 👍 / 👎.

Comment on lines +1112 to +1116
migraphx::shape input_shape(
handle->model->useFP16 ? migraphx::shape::half_type : migraphx::shape::float_type,
{(size_t)maxBatchSize, (size_t)numSpatialFeatures, (size_t)nnYLen, (size_t)nnXLen}
);
params["input_spatial"] = migraphx::argument(input_shape, buffers->userInputBuffer.data());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Convert FP16 inputs before binding MIGraphX arguments

When FP16 is enabled, input_shape is declared as half_type but input_spatial is bound to float* memory (userInputBuffer.data()), so MIGraphX will interpret the buffer with the wrong element width/type. This produces corrupted inputs (or runtime failures) specifically when mgxUseFP16 is true; the backend should convert host inputs to migraphx::half before creating FP16 arguments, similar to the test helper paths.

Useful? React with 👍 / 👎.

// For now, treat as ordinary residual block
// Full implementation would use gpoolConv, gpool pooling, and gpoolToBiasMul
(void)blockDesc;
return buildResidualBlock(builder, input, *(const ResidualBlockDesc*)&blockDesc);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsafe cast from global-pooling to residual desc

buildGlobalPoolingResidualBlock reinterprets GlobalPoolingResidualBlockDesc as ResidualBlockDesc, but these structs are not layout-compatible (GlobalPoolingResidualBlockDesc has an extra modelVersion field before preBN). This misaligns all subsequent fields and feeds invalid layer descriptors into graph construction whenever a global-pooling block is encountered, leading to incorrect graphs or crashes.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant