[AMD/ROCm] kimik2.5 int4 mi355x update by seungrokj · Pull Request #909 · SemiAnalysisAI/InferenceX

seungrokj · 2026-03-13T04:58:32Z

waiting for the optimized upstream docker image.

Regards,
Seungrok

Signed-off-by: seungrokj <seungrok.jung@amd.com>

github-actions · 2026-03-13T04:58:42Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

functionstackx · 2026-03-13T20:59:37Z

@seungrokj can u add in ur perf-changelog that VLLM_ROCM_USE_AITER=1 works now due to PR vllm-project/vllm#35641

Signed-off-by: seungrokj <seungrok.jung@amd.com>

functionstackx · 2026-03-26T23:36:31Z

@seungrokj any updates on this PR?

functionstackx · 2026-03-26T23:37:15Z

@claude create a new PR that ports this PR over & uses vllm/vllm-openai-rocm:v0.18.0 image

Klaud-Cold · 2026-03-26T23:37:39Z

Claude finished @functionstackx's task in 2m 59s —— View job

Porting PR #909 with `vllm/vllm-openai-rocm:v0.18.0` image

Gather context from PR [AMD/ROCm] kimik2.5 int4 mi355x update #909 changed files
Create new branch from main with ported changes
Update image to vllm/vllm-openai-rocm:v0.18.0
Create new PR

Summary

Created PR #950 which ports all changes from this PR with the resolved upstream docker image:

Image: v0.15.1 → v0.18.0 in amd-master.yaml
AITER MLA: Added export VLLM_ROCM_USE_AITER=1
Benchmark script: Added --max-num-seqs 256, removed --disable-log-requests
perf-changelog: Added entry documenting all changes

#950

--- |

Port changes from PR #909 with the resolved upstream docker image: - Upgrade image from v0.15.1 to v0.18.0 - Enable AITER MLA (VLLM_ROCM_USE_AITER=1) - Add --max-num-seqs 256, remove --disable-log-requests - Update perf-changelog Co-authored-by: functionstackx <functionstackx@users.noreply.github.com> Co-authored-by: seungrokj <seungrokj@users.noreply.github.com>

seungrokj · 2026-03-27T00:30:37Z

hi @functionstackx I tested a few thing to figure out non-deterministic behavior of kimik2 fp4 case. And now it's resolved. So now will work on "int4" model. Or, I'll update on top of mr. Klaud's work on #950

functionstackx · 2026-03-27T00:41:00Z

@seungrokj feel free to continue working on this! appreipcate ur help!

functionstackx · 2026-03-27T00:42:13Z

i started an klaud cold PR cuz i thought u were busy with other tasks and didnt wanna delay adding kimi k2.5 int4 mi355 AITER which already is an easy win cuz it has great perf improvmenets

seungrokj · 2026-03-27T00:58:05Z

i started an klaud cold PR cuz i thought u were busy with other tasks and didnt wanna delay adding kimi k2.5 int4 mi355 AITER which already is an easy win cuz it has great perf improvmenets

Sure will work on this today!

functionstackx · 2026-03-27T01:25:03Z

hi @seungrokj

since #909 already passed validation & improvements is easy win & we ideally want to show these improvements on the frontend ASAP, i am gonna merge #909 . for any additional changes, can u build on top of #909

…which has the AITER MLA patch for num_heads=8 (#950) * [AMD/ROCm] kimik2.5 int4 mi355x: upgrade to vllm-openai-rocm:v0.18.0 Port changes from PR #909 with the resolved upstream docker image: - Upgrade image from v0.15.1 to v0.18.0 - Enable AITER MLA (VLLM_ROCM_USE_AITER=1) - Add --max-num-seqs 256, remove --disable-log-requests - Update perf-changelog Co-authored-by: functionstackx <functionstackx@users.noreply.github.com> Co-authored-by: seungrokj <seungrokj@users.noreply.github.com> * Update perf-changelog PR link to #950 Co-authored-by: functionstackx <functionstackx@users.noreply.github.com> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: functionstackx <functionstackx@users.noreply.github.com> Co-authored-by: seungrokj <seungrokj@users.noreply.github.com>

seungrokj · 2026-03-27T01:29:59Z

@functionstackx yes if we are just using tp8. Then #909 should be solid. It also aligned with internal measurement on
vllm/vllm-openai-rocm:v0.18.0
img.

functionstackx · 2026-03-27T01:31:10Z

@seungrokj do u see anything better with TP4? if TP4 is on the pareto frontier, feel free to add it

seungrokj · 2026-03-27T02:44:43Z

@functionstackx based on fp4 case int4 (same memory footprint) could have better tput/gpu. On testing internally and if it looks good then will raise a subsequent PR.

functionstackx · 2026-03-27T02:46:28Z

thanks! looking forward to ur follow up PR on if tp4 is better or not

kimik2.5 int4 mi355x optimization

92246bf

Signed-off-by: seungrokj <seungrok.jung@amd.com>

github-project-automation bot added this to InferenceMAX Board Mar 13, 2026

seungrokj and others added 4 commits March 16, 2026 12:11

kimik2.5 int4 perf changelog

2a4cb9b

Signed-off-by: seungrokj <seungrok.jung@amd.com>

Merge branch 'main' into srok/kimik2.5int4_mi355x

85cff65

kimik2.5 int4 perf changelog

9cd9ff8

Signed-off-by: seungrokj <seungrok.jung@amd.com>

kimik2.5 int4 perf changelog

2d4335f

Signed-off-by: seungrokj <seungrok.jung@amd.com>

Klaud-Cold mentioned this pull request Mar 26, 2026

[AMD/ROCm] kimik2.5 int4 mi355x: upgrade to vllm-openai-rocm:v0.18.0 #950

Merged

seungrokj closed this Mar 27, 2026

github-project-automation bot moved this to Done in InferenceMAX Board Mar 27, 2026

Conversation

seungrokj commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

functionstackx commented Mar 13, 2026

Uh oh!

functionstackx commented Mar 26, 2026

Uh oh!

functionstackx commented Mar 26, 2026

Uh oh!

Klaud-Cold commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Porting PR #909 with vllm/vllm-openai-rocm:v0.18.0 image

Summary

Uh oh!

seungrokj commented Mar 27, 2026

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

seungrokj commented Mar 27, 2026

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

seungrokj commented Mar 27, 2026

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

seungrokj commented Mar 27, 2026

Uh oh!

functionstackx commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seungrokj commented Mar 13, 2026 •

edited

Loading

Klaud-Cold commented Mar 26, 2026 •

edited

Loading

Porting PR #909 with `vllm/vllm-openai-rocm:v0.18.0` image