ToMoE (TMLR)

Official implementation of "ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning."

Paper: https://openreview.net/pdf?id=RFHq46pjb6

Overview

This repo provides:

Hypernetwork training for dynamic structural pruning.
A conversion script to turn dense LLaMA models into pruned MoE models.
Model definitions for the pruned MoE variants.

Dependencies are specified in environment.yml.

Key scripts

train_tomoe.py: Train the hypernetwork.
prune_tomoe.py: Convert a dense model to a pruned MoE model.

Example: Training the hypernetwork

CUDA_VISIBLE_DEVICES=0 nohup torchrun --nproc_per_node=1 --master_port=12343 train_tomoe.py \
    --use_bf16 True \
    --save_interval 100000 \
    --dynamic_experts 8 \
    --dynamic_alpha 3.0 \
    --load_balance_alpha 1.0 \
    --hf_model meta-llama/Llama-2-7b-hf \
    --p 0.5 \
    --total_n_step 20000 \
    --lam 16.0 \
    --kd_loss True \
    --dataset_list ['mix'] \
    --dataset_seed 777 --use_fsdp False  --out_dir /path/to/output_dir

Example: Pruning

python prune_tomoe.py \
    --hf_model meta-llama/Llama-2-7b-hf \
    --hn_path /path/to/hn-ckpt-final.pt \
    --output_dir /path/to/tomoe_model \
    --dynamic_experts 8 \
    --attn_prune true

Example: Zero-Shot Evaluation

Please refer to https://github.com/EleutherAI/lm-evaluation-harness

accelerate launch --main_process_port 12323 --num_processes 1 \
    -m lm_eval --model hf \
    --model_args pretrained=/path/to/tomoe_model,dtype=bfloat16,trust_remote_code=true \
    --tasks hellaswag,arc_easy,arc_challenge,piqa,winogrande \
    --device cuda:0 \
    --batch_size 32

Repo layout

models/: Model definitions (dense + pruned MoE).
tomoe/: Hypernetwork and pruning helper utilities.
utils/: Training/runtime helpers.
data/: Dataset utilities.

Citation

If you found this repo useful, please cite:

@article{
    gao2026tomoe,
    title={ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning},
    author={Shangqian Gao and Ting Hua and Reza Shirkavand and Chi-Heng Lin and Zheng Tang and Zhengao Li and Longge Yuan and Fangyi Li and Zeyu Zhang and Alireza Ganjdanesh and Qian Lou and Jie Xu and Yen-Chang Hsu},
    journal={Transactions on Machine Learning Research},
    issn={2835-8856},
    year={2026},
    url={https://openreview.net/forum?id=RFHq46pjb6},
    note={J2C Certification}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToMoE (TMLR)

Overview

Key scripts

Example: Training the hypernetwork

Example: Pruning

Example: Zero-Shot Evaluation

Repo layout

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
data		data
models		models
tomoe		tomoe
utils		utils
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
prune_tomoe.py		prune_tomoe.py
run.bash		run.bash
train_tomoe.py		train_tomoe.py

License

gaosh/ToMoE

Folders and files

Latest commit

History

Repository files navigation

ToMoE (TMLR)

Overview

Key scripts

Example: Training the hypernetwork

Example: Pruning

Example: Zero-Shot Evaluation

Repo layout

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages