mi325x: port kimik2.5 int4 changes from mi355x PR #950 by functionstackx · Pull Request #957 · SemiAnalysisAI/InferenceX

functionstackx · 2026-03-27T07:09:21Z

Closes #954

Ports the mi355x int4 changes from PR #950 to mi325x:

Upgrade vLLM ROCm image from v0.16.0 to v0.18.0
Enable AITER MLA (export VLLM_ROCM_USE_AITER=1)
Add --max-num-seqs 256
Remove --disable-log-requests

- Upgrade vLLM ROCm image from v0.16.0 to v0.18.0 - Enable AITER MLA (export VLLM_ROCM_USE_AITER=1) - Add --max-num-seqs 256 - Remove --disable-log-requests Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

github-actions · 2026-03-27T07:09:29Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-27T07:09:29Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

claude

LGTM — straightforward port of #950's MI355X changes to MI325X; the only finding is a nit already flagged inline.

Extended reasoning...

Overview

This PR modifies three files: updates the vLLM ROCm image tag in .github/configs/amd-master.yaml, adjusts the kimik2.5_int4_mi325x.sh benchmark script (adds VLLM_ROCM_USE_AITER=1, --max-num-seqs 256, removes --disable-log-requests), and appends a changelog entry to perf-changelog.yaml. All changes are a direct mirror of PR #950 (MI355X) applied to the MI325X target.

Security risks

None. This is a benchmark configuration and script change with no auth, crypto, or permission-sensitive code paths.

Level of scrutiny

Low. The diff is mechanical — each change in the MI325X files matches the corresponding MI355X change in PR #950 exactly. The image version, env var, and flag modifications are self-consistent and follow the established pattern in this repo.

Other factors

The only finding is a nit: the pr-link in the new changelog entry uses a XXX placeholder instead of the actual PR number. This is already flagged as an inline comment and is a pre-existing pattern in the file (5+ other entries have the same issue). It has no runtime impact. No prior bot or human reviews exist on this PR.

claude · 2026-03-27T07:19:00Z

perf-changelog.yaml

+    - "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0"
+    - "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1, https://github.com/vllm-project/vllm/issues/35641"
+    - "Triton Fused Moe Tuning https://github.com/vllm-project/vllm/pull/35093"
+    - "Add --max-num-seqs 256, remove --disable-log-requests"


🟡 The new perf-changelog.yaml entry for kimik2.5-int4-mi325x-vllm uses pull/XXX as a placeholder instead of the actual PR number. Please replace it with https://github.com/SemiAnalysisAI/InferenceX/pull/957.

Extended reasoning...

What the bug is: The changelog entry added by this PR at perf-changelog.yaml line 1095 contains an unreplaced placeholder in the pr-link field: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. The template placeholder XXX was never substituted with the actual PR number before submission.

The specific code path: The diff shows the new entry introduced in this PR:

- config-keys: - kimik2.5-int4-mi325x-vllm description: - "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0" - "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1" - "Add --max-num-seqs 256, remove --disable-log-requests" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The pr-link value is the only field that wasn't filled in.

Why existing checks don't catch it: There is no automated validation in CI that enforces pr-link values are real PR URLs rather than placeholder strings. The XXX pattern already exists in at least 5 other entries in the file (lines ~295, 770, 798, 835, 852), confirming this is a recurring oversight that slips through undetected.

Step-by-step proof: This PR is #957 (visible in the PR metadata). The diff adds a new entry with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. After merging, anyone looking up this changelog entry to trace the change back to its PR will follow a broken link. The correct value should be https://github.com/SemiAnalysisAI/InferenceX/pull/957.

Impact: The changelog is used for traceability—linking a configuration change back to the PR that introduced it. A broken link makes it impossible to find the rationale, discussion, and review history for this change. This does not affect runtime functionality but degrades developer experience and audit trails.

How to fix: Change the pr-link value in the new entry from https://github.com/SemiAnalysisAI/InferenceX/pull/XXX to https://github.com/SemiAnalysisAI/InferenceX/pull/957. Additionally, consider adding a CI check that rejects XXX placeholders in pr-link fields to prevent this recurring pattern.

Note on duplication: Bug reports 001 and 003 both describe this same issue and have been correctly merged into a single report. The refutation that bug_003 is a duplicate of bug_001 is accurate—both identify the identical XXX placeholder at the same location—and this merged report represents that single underlying issue.

functionstackx · 2026-03-27T08:05:27Z

@claude resolve the conflicts

Klaud-Cold · 2026-03-27T08:05:55Z

Claude finished @functionstackx's task in 2m 12s —— View job

Resolved Merge Conflicts

Identify merge conflicts — conflict in perf-changelog.yaml
Resolve conflicts — kept both the kimik2.5-int4-mi325x-vllm entry (this PR) and the gptoss-fp4-h100/h200-vllm entry (from chore: update gptoss H100 & H200 vLLM image to v0.18.0 #960)
Push resolved changes

Merge conflict in perf-changelog.yaml resolved and pushed. Both changelog entries are preserved.
|

Keep both the kimik2.5-int4-mi325x-vllm entry from this PR and the gptoss-fp4-h100/h200-vllm entry from PR #960. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

mi325x: port kimik2.5 int4 changes from mi355x PR #950

4a4f574

- Upgrade vLLM ROCm image from v0.16.0 to v0.18.0 - Enable AITER MLA (export VLLM_ROCM_USE_AITER=1) - Add --max-num-seqs 256 - Remove --disable-log-requests Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx requested a review from a team March 27, 2026 07:09

functionstackx requested review from billishyahao and chunfangamd as code owners March 27, 2026 07:09

Update perf-changelog.yaml

974e555

functionstackx added the sweep-enabled label Mar 27, 2026

github-project-automation bot added this to InferenceMAX Board Mar 27, 2026

claude bot reviewed Mar 27, 2026

View reviewed changes

functionstackx removed the sweep-enabled label Mar 27, 2026

Resolve merge conflict with origin/main in perf-changelog.yaml

8f51204

Keep both the kimik2.5-int4-mi325x-vllm entry from this PR and the gptoss-fp4-h100/h200-vllm entry from PR #960. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx merged commit 02770ff into main Mar 27, 2026
14 checks passed

github-project-automation bot moved this to Done in InferenceMAX Board Mar 27, 2026

functionstackx deleted the claude/issue-954-20260327-0702 branch March 27, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mi325x: port kimik2.5 int4 changes from mi355x PR #950#957

mi325x: port kimik2.5 int4 changes from mi355x PR #950#957
functionstackx merged 3 commits intomainfrom
claude/issue-954-20260327-0702

functionstackx commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Mar 27, 2026

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

Klaud-Cold commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

functionstackx commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

Klaud-Cold commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Resolved Merge Conflicts

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Klaud-Cold commented Mar 27, 2026 •

edited

Loading