Skip to content

[Feat] Support GLM-4.7 MTP in vLLM-ATOM plugin#722

Open
kliuae wants to merge 14 commits intoROCm:mainfrom
kliuae:kliuae/plugin_enable_glm4_mtp_merge
Open

[Feat] Support GLM-4.7 MTP in vLLM-ATOM plugin#722
kliuae wants to merge 14 commits intoROCm:mainfrom
kliuae:kliuae/plugin_enable_glm4_mtp_merge

Conversation

@kliuae
Copy link
Copy Markdown
Contributor

@kliuae kliuae commented May 8, 2026

Motivation

This PR builds on top of the MTP framework in #557, adds MTP support to GLM-4.7 model for vLLM-ATOM.
Currently this PR contains changes from #557, and will be more concise once it gets upstreamed.

Technical Details

  • Register Glm4MoeMTPModel
  • Add glm4_moe_mtp modeling
  • Fix RoPE double apply in mha when ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=0

Test Plan

Accuracy test with lm_eval

Model: zai-org/GLM-4.7-FP8

Server command:

ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 \
  vllm serve zai-org/GLM-4.7-FP8 \
  -tp 8 \
  --max-num-seqs 1024 \
  --gpu-memory-utilization 0.9 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --load-format fastsafetensors \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype fp8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1

lm_eval command

lm_eval --model local-completions   --model_args model=zai-org/GLM-4.7-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,tokenized_requests=False  --tasks gsm8k --num_fewshot 5

Test Result

gsm8k

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match _ 0.9439 _ 0.0063
strict-match 5 exact_match _ 0.9439 _ 0.0063

Submission Checklist

whx-sjtu and others added 14 commits April 23, 2026 10:49
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants