Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ kimik2.5-int4-mi355x-vllm:
- { tp: 8, conc-start: 4, conc-end: 64 }

kimik2.5-int4-mi325x-vllm:
image: vllm/vllm-openai-rocm:v0.16.0
image: vllm/vllm-openai-rocm:v0.18.0
model: moonshotai/Kimi-K2.5
model-prefix: kimik2.5
runner: mi325x
Expand Down
3 changes: 2 additions & 1 deletion benchmarks/single_node/kimik2.5_int4_mi325x.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,14 @@ PORT=${PORT:-8888}
start_gpu_monitor

set -x
export VLLM_ROCM_USE_AITER=1
vllm serve $MODEL --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.95 \
--max-model-len $MAX_MODEL_LEN \
--block-size=64 \
--disable-log-requests \
--trust-remote-code \
--max-num-seqs 256 \
--mm-encoder-tp-mode data > $SERVER_LOG 2>&1 &

SERVER_PID=$!
Expand Down
9 changes: 9 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1093,6 +1093,15 @@
- "Add --max-num-seqs 256, remove --disable-log-requests"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/950

- config-keys:
- kimik2.5-int4-mi325x-vllm
description:
- "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0"
- "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1, https://github.com/vllm-project/vllm/issues/35641"
- "Triton Fused Moe Tuning https://github.com/vllm-project/vllm/pull/35093"
- "Add --max-num-seqs 256, remove --disable-log-requests"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry for kimik2.5-int4-mi325x-vllm uses pull/XXX as a placeholder instead of the actual PR number. Please replace it with https://github.com/SemiAnalysisAI/InferenceX/pull/957.

Extended reasoning...

What the bug is: The changelog entry added by this PR at perf-changelog.yaml line 1095 contains an unreplaced placeholder in the pr-link field: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. The template placeholder XXX was never substituted with the actual PR number before submission.

The specific code path: The diff shows the new entry introduced in this PR:

- config-keys:
    - kimik2.5-int4-mi325x-vllm
  description:
    - "Upgrade vLLM ROCm image from v0.16.0 to v0.18.0"
    - "Enable AITER MLA, export VLLM_ROCM_USE_AITER=1"
    - "Add --max-num-seqs 256, remove --disable-log-requests"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The pr-link value is the only field that wasn't filled in.

Why existing checks don't catch it: There is no automated validation in CI that enforces pr-link values are real PR URLs rather than placeholder strings. The XXX pattern already exists in at least 5 other entries in the file (lines ~295, 770, 798, 835, 852), confirming this is a recurring oversight that slips through undetected.

Step-by-step proof: This PR is #957 (visible in the PR metadata). The diff adds a new entry with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. After merging, anyone looking up this changelog entry to trace the change back to its PR will follow a broken link. The correct value should be https://github.com/SemiAnalysisAI/InferenceX/pull/957.

Impact: The changelog is used for traceability—linking a configuration change back to the PR that introduced it. A broken link makes it impossible to find the rationale, discussion, and review history for this change. This does not affect runtime functionality but degrades developer experience and audit trails.

How to fix: Change the pr-link value in the new entry from https://github.com/SemiAnalysisAI/InferenceX/pull/XXX to https://github.com/SemiAnalysisAI/InferenceX/pull/957. Additionally, consider adding a CI check that rejects XXX placeholders in pr-link fields to prevent this recurring pattern.

Note on duplication: Bug reports 001 and 003 both describe this same issue and have been correctly merged into a single report. The refutation that bug_003 is a duplicate of bug_001 is accurate—both identify the identical XXX placeholder at the same location—and this merged report represents that single underlying issue.

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/957

- config-keys:
- gptoss-fp4-h100-vllm
- gptoss-fp4-h200-vllm
Expand Down