[Feat] Support GLM-4.7 MTP in vLLM-ATOM plugin by kliuae · Pull Request #722 · ROCm/ATOM

kliuae · 2026-05-08T10:21:52Z

Motivation

This PR builds on top of the MTP framework in #557, adds MTP support to GLM-4.7 model for vLLM-ATOM.
Currently this PR contains changes from #557, and will be more concise once it gets upstreamed.

Technical Details

Register Glm4MoeMTPModel
Add glm4_moe_mtp modeling
Fix RoPE double apply in mha when ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=0

Test Plan

Accuracy test with lm_eval

Model: zai-org/GLM-4.7-FP8

Server command:

ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 \
  vllm serve zai-org/GLM-4.7-FP8 \
  -tp 8 \
  --max-num-seqs 1024 \
  --gpu-memory-utilization 0.9 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --load-format fastsafetensors \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype fp8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1

lm_eval command

lm_eval --model local-completions   --model_args model=zai-org/GLM-4.7-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,tokenized_requests=False  --tasks gsm8k --num_fewshot 5

Test Result

gsm8k

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	_	0.9439	_	0.0063
		strict-match	5	exact_match	_	0.9439	_	0.0063

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

whx-sjtu and others added 14 commits April 23, 2026 10:49

adapt mtp for glm5 (vllm plugin)

922aa8e

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

add patch to support mtp>1

3b82d15

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix model load failure of draft model

3f7d3d4

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

adapt full graph with mtp enabled

4c9c960

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix MLA MTP acceptance issue

75c46e6

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fall back to vllm-style mtp position

ca42e27

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix embedding sharing failure for mtp

a7f6918

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix lint

90aa06b

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix comment

4a663a4

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

remove warnig log

9050626

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

Merge branch 'whx-sjtu/atom-support-vllm-glm5-mtp'

7e311e1

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

add mtp support for glm4

2acfa65

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge main

669ba3f

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

fix rope double apply for mha

f1e2d7e

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Support GLM-4.7 MTP in vLLM-ATOM plugin#722

[Feat] Support GLM-4.7 MTP in vLLM-ATOM plugin#722
kliuae wants to merge 14 commits intoROCm:mainfrom
kliuae:kliuae/plugin_enable_glm4_mtp_merge

kliuae commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kliuae commented May 8, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants