mi325x: port kimik2.5 int4 changes from mi355x PR #950#957
mi325x: port kimik2.5 int4 changes from mi355x PR #950#957functionstackx merged 3 commits intomainfrom
Conversation
- Upgrade vLLM ROCm image from v0.16.0 to v0.18.0 - Enable AITER MLA (export VLLM_ROCM_USE_AITER=1) - Add --max-num-seqs 256 - Remove --disable-log-requests Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
There was a problem hiding this comment.
LGTM — straightforward port of #950's MI355X changes to MI325X; the only finding is a nit already flagged inline.
Extended reasoning...
Overview
This PR modifies three files: updates the vLLM ROCm image tag in .github/configs/amd-master.yaml, adjusts the kimik2.5_int4_mi325x.sh benchmark script (adds VLLM_ROCM_USE_AITER=1, --max-num-seqs 256, removes --disable-log-requests), and appends a changelog entry to perf-changelog.yaml. All changes are a direct mirror of PR #950 (MI355X) applied to the MI325X target.
Security risks
None. This is a benchmark configuration and script change with no auth, crypto, or permission-sensitive code paths.
Level of scrutiny
Low. The diff is mechanical — each change in the MI325X files matches the corresponding MI355X change in PR #950 exactly. The image version, env var, and flag modifications are self-consistent and follow the established pattern in this repo.
Other factors
The only finding is a nit: the pr-link in the new changelog entry uses a XXX placeholder instead of the actual PR number. This is already flagged as an inline comment and is a pre-existing pattern in the file (5+ other entries have the same issue). It has no runtime impact. No prior bot or human reviews exist on this PR.
| - "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0" | ||
| - "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1, https://github.com/vllm-project/vllm/issues/35641" | ||
| - "Triton Fused Moe Tuning https://github.com/vllm-project/vllm/pull/35093" | ||
| - "Add --max-num-seqs 256, remove --disable-log-requests" |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry for kimik2.5-int4-mi325x-vllm uses pull/XXX as a placeholder instead of the actual PR number. Please replace it with https://github.com/SemiAnalysisAI/InferenceX/pull/957.
Extended reasoning...
What the bug is: The changelog entry added by this PR at perf-changelog.yaml line 1095 contains an unreplaced placeholder in the pr-link field: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. The template placeholder XXX was never substituted with the actual PR number before submission.
The specific code path: The diff shows the new entry introduced in this PR:
- config-keys:
- kimik2.5-int4-mi325x-vllm
description:
- "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0"
- "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1"
- "Add --max-num-seqs 256, remove --disable-log-requests"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXThe pr-link value is the only field that wasn't filled in.
Why existing checks don't catch it: There is no automated validation in CI that enforces pr-link values are real PR URLs rather than placeholder strings. The XXX pattern already exists in at least 5 other entries in the file (lines ~295, 770, 798, 835, 852), confirming this is a recurring oversight that slips through undetected.
Step-by-step proof: This PR is #957 (visible in the PR metadata). The diff adds a new entry with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. After merging, anyone looking up this changelog entry to trace the change back to its PR will follow a broken link. The correct value should be https://github.com/SemiAnalysisAI/InferenceX/pull/957.
Impact: The changelog is used for traceability—linking a configuration change back to the PR that introduced it. A broken link makes it impossible to find the rationale, discussion, and review history for this change. This does not affect runtime functionality but degrades developer experience and audit trails.
How to fix: Change the pr-link value in the new entry from https://github.com/SemiAnalysisAI/InferenceX/pull/XXX to https://github.com/SemiAnalysisAI/InferenceX/pull/957. Additionally, consider adding a CI check that rejects XXX placeholders in pr-link fields to prevent this recurring pattern.
Note on duplication: Bug reports 001 and 003 both describe this same issue and have been correctly merged into a single report. The refutation that bug_003 is a duplicate of bug_001 is accurate—both identify the identical XXX placeholder at the same location—and this merged report represents that single underlying issue.
|
@claude resolve the conflicts |
|
Claude finished @functionstackx's task in 2m 12s —— View job Resolved Merge Conflicts
Merge conflict in |
Keep both the kimik2.5-int4-mi325x-vllm entry from this PR and the gptoss-fp4-h100/h200-vllm entry from PR #960. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Closes #954
Ports the mi355x int4 changes from PR #950 to mi325x:
Generated with Claude Code