Skip to content

mi325x: port kimik2.5 int4 changes from mi355x PR #950#957

Merged
functionstackx merged 3 commits intomainfrom
claude/issue-954-20260327-0702
Mar 27, 2026
Merged

mi325x: port kimik2.5 int4 changes from mi355x PR #950#957
functionstackx merged 3 commits intomainfrom
claude/issue-954-20260327-0702

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

Closes #954

Ports the mi355x int4 changes from PR #950 to mi325x:

  • Upgrade vLLM ROCm image from v0.16.0 to v0.18.0
  • Enable AITER MLA (export VLLM_ROCM_USE_AITER=1)
  • Add --max-num-seqs 256
  • Remove --disable-log-requests

Generated with Claude Code

- Upgrade vLLM ROCm image from v0.16.0 to v0.18.0
- Enable AITER MLA (export VLLM_ROCM_USE_AITER=1)
- Add --max-num-seqs 256
- Remove --disable-log-requests

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx functionstackx requested a review from a team March 27, 2026 07:09
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward port of #950's MI355X changes to MI325X; the only finding is a nit already flagged inline.

Extended reasoning...

Overview

This PR modifies three files: updates the vLLM ROCm image tag in .github/configs/amd-master.yaml, adjusts the kimik2.5_int4_mi325x.sh benchmark script (adds VLLM_ROCM_USE_AITER=1, --max-num-seqs 256, removes --disable-log-requests), and appends a changelog entry to perf-changelog.yaml. All changes are a direct mirror of PR #950 (MI355X) applied to the MI325X target.

Security risks

None. This is a benchmark configuration and script change with no auth, crypto, or permission-sensitive code paths.

Level of scrutiny

Low. The diff is mechanical — each change in the MI325X files matches the corresponding MI355X change in PR #950 exactly. The image version, env var, and flag modifications are self-consistent and follow the established pattern in this repo.

Other factors

The only finding is a nit: the pr-link in the new changelog entry uses a XXX placeholder instead of the actual PR number. This is already flagged as an inline comment and is a pre-existing pattern in the file (5+ other entries have the same issue). It has no runtime impact. No prior bot or human reviews exist on this PR.

- "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0"
- "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1, https://github.com/vllm-project/vllm/issues/35641"
- "Triton Fused Moe Tuning https://github.com/vllm-project/vllm/pull/35093"
- "Add --max-num-seqs 256, remove --disable-log-requests"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry for kimik2.5-int4-mi325x-vllm uses pull/XXX as a placeholder instead of the actual PR number. Please replace it with https://github.com/SemiAnalysisAI/InferenceX/pull/957.

Extended reasoning...

What the bug is: The changelog entry added by this PR at perf-changelog.yaml line 1095 contains an unreplaced placeholder in the pr-link field: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. The template placeholder XXX was never substituted with the actual PR number before submission.

The specific code path: The diff shows the new entry introduced in this PR:

- config-keys:
    - kimik2.5-int4-mi325x-vllm
  description:
    - "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0"
    - "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1"
    - "Add --max-num-seqs 256, remove --disable-log-requests"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The pr-link value is the only field that wasn't filled in.

Why existing checks don't catch it: There is no automated validation in CI that enforces pr-link values are real PR URLs rather than placeholder strings. The XXX pattern already exists in at least 5 other entries in the file (lines ~295, 770, 798, 835, 852), confirming this is a recurring oversight that slips through undetected.

Step-by-step proof: This PR is #957 (visible in the PR metadata). The diff adds a new entry with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. After merging, anyone looking up this changelog entry to trace the change back to its PR will follow a broken link. The correct value should be https://github.com/SemiAnalysisAI/InferenceX/pull/957.

Impact: The changelog is used for traceability—linking a configuration change back to the PR that introduced it. A broken link makes it impossible to find the rationale, discussion, and review history for this change. This does not affect runtime functionality but degrades developer experience and audit trails.

How to fix: Change the pr-link value in the new entry from https://github.com/SemiAnalysisAI/InferenceX/pull/XXX to https://github.com/SemiAnalysisAI/InferenceX/pull/957. Additionally, consider adding a CI check that rejects XXX placeholders in pr-link fields to prevent this recurring pattern.

Note on duplication: Bug reports 001 and 003 both describe this same issue and have been correctly merged into a single report. The refutation that bug_003 is a duplicate of bug_001 is accurate—both identify the identical XXX placeholder at the same location—and this merged report represents that single underlying issue.

@functionstackx
Copy link
Copy Markdown
Contributor Author

@claude resolve the conflicts

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 27, 2026

Claude finished @functionstackx's task in 2m 12s —— View job


Resolved Merge Conflicts

Merge conflict in perf-changelog.yaml resolved and pushed. Both changelog entries are preserved.
|

Keep both the kimik2.5-int4-mi325x-vllm entry from this PR and the
gptoss-fp4-h100/h200-vllm entry from PR #960.

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx functionstackx merged commit 02770ff into main Mar 27, 2026
14 checks passed
@functionstackx functionstackx deleted the claude/issue-954-20260327-0702 branch March 27, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

mi325: update kimi int4

2 participants