30 Apr 23:01

c4b16d4

v0.8.0 Latest

Latest

Liger-Kernel v0.8.0

Highlights ✨

🚀 MoE Training Acceleration via `LigerExperts`

A new fused MoE expert kernel (LigerExperts, backed by LigerFusedMoEFunction) inspired by SonicMoE (arXiv:2512.14080). It replaces the eager per-expert loop in HuggingFace MoE blocks with a Triton grouped-GEMM + fused SwiGLU + token aggregation path, with a memory-efficient backward.

Qwen3-30B-A3B fine-tuning on 2 × H100 (max seq len 32768, BS=1, GA=8): 8.24× tokens/sec, 8.19× training step, 4.10× eval, ~1% memory savings.

Auto-patched into Mixtral, Qwen3-MoE, Qwen3-Next, Qwen3.5-MoE, Qwen3-VL-MoE, GLM4V-MoE, and HunYuan-MoE-V1. (#1179, #1192)

🤖 Claude Code Skills for Kernel Authoring (`.claude/skills/`)

Three first-party Claude Code skills now ship with the repo to make extending Liger-Kernel dramatically easier:

liger-kernel-dev (#1170) — builds a production-ready Triton kernel end-to-end from a PyTorch op (file, URL, snippet, or NL): ops, module wrapper, functional API, tests, benchmark, exports.
liger-autopatch (#1167, #1177) — adds Liger support for a new HuggingFace model: reads modeling_*.py, generates lce_forward, monkey-patch entry, and convergence tests.
liger-kernel-perf (#1185) — profiles an existing kernel, generates and benchmarks optimization variants (Ampere / Hopper / Blackwell-aware), applies the winner.

Several PRs in this release were authored using these skills (e.g. #1165, #1166, #1171, #1187).

🧩 New Model Support

Qwen3.5 MoE (#1109) and Qwen3.5 dense (#1123)
Qwen3.5 multimodal — Qwen3_5ForConditionalGeneration (#1150)
Nemotron (#1165), with Liger ReLU² wired in (#1176)
Ministral (#1166)
Gemma 4 dense text, 31B-targeted (#1196)
Falcon H1 SwiGLU multipliers (#1201)

Ascend NPU Backend Support

Liger-Kernel now supports the Ascend NPU backend. Huge thanks to the Ascend team for the sustained effort across this release. See #969 for the full tracking issue and roadmap.

What's Changed

[NPU]Fix OOM in benchmark_tvd on NPU by @lowdy1 in #1103
Add return_predicted_tokens support for cross-entropy kernels by @yukiu00 in #1091
Refactoring checkstyle as independent workflow. by @kolehma8 in #1105
Bug fix for merge queue. by @kolehma8 in #1106
Renaming Nvidia tests to allow enabling them from rules. by @kolehma8 in #1107
[NPU]: fused_add_rms_norm kernel distinguish the chunking strategy by @TianHao324 in #1100
Optimize Group norm forward kernel by @vaibhavjindal in #1112
[NPU]: use grad_and_value for reference implementation on NPU by @noemotiovon in #1056
[NPU]: remove num_stages and num_warps from NPU Triton kernels by @noemotiovon in #1111
Add support for Qwen3.5 MoE by @michaelroyzen in #1109
[NPU]:rms_norm kernel employs two-dimensional tensors by @TianHao324 in #1108
[NPU] Add kl_div support on NPU by @orangeH25 in #1118
Optimize tvd kernel to perform scaling inside the forward kernel by @vaibhavjindal in #1127
[NPU] Add softmax implementation by @lowdy1 in #1087
Add mHC fused kernels + LigerMHC API + benchmarks by @yukiu00 in #1065
Add mHC kernel documentation to README and API reference by @yukiu00 in #1132
Kolehma8/dist swiglu by @kolehma8 in #1129
[GRPO] add grpo loss types by @kashif in #993
[NPU]:Added support for the layer_norm operator in npu by @TianHao324 in #1113
[NPU] Optimize KLDiv backward kernel performance by @orangeH25 in #1130
Add mm_token_type_ids param to VL models lce_forward to fix ValueError by @albertvillanova in #1120
[NPU]:Added support for the dyt operator by @TianHao324 in #1124
[NPU]:Added support for the jsd operator by @TianHao324 in #1134
[NPU] Add Ascend NPU CI status in README by @xuedinge233 in #1131
[NPU]:Fix the reference issue of the dyt operator. by @TianHao324 in #1137
[NPU]:Added support for the poly_norm operator by @TianHao324 in #1114
[NPU]:Adding the single_block strategy to the softmax operator by @TianHao324 in #1138
[NPU]:Resolve the ub overflow issue of the kl_div operator by @TianHao324 in #1141
[NPU] FIX bug in softmax by @lowdy1 in #1139
[Test]: Fix test imports to use package-level ops imports by @noemotiovon in #1133
Add support for Qwen3.5 dense models by @ruilin-gif in #1123
[NPU] Implement a conservative llama4_rope by @lowdy1 in #1143
Add mm_token_type_ids to qwen3_5_moe lce_forward by @albertvillanova in #1140
[NPU] FIX Softmax for loop by @lowdy1 in #1147
Set warn_only for unsupported deterministic algorithms by @Tcc0403 in #1145
[NPU] Add group norm support on NPU by @orangeH25 in #1144
[Test]: Refactor benchmark_geglu with standardized model configs by @noemotiovon in #1116
[NPU]:Add support for the fused_linear_jsd operator. by @TianHao324 in #1151
[Fix, NPU] Set soc_info for the NPU device by @zheliuyu in #1149
[NPU] Add support for the grpo_loss by @UserChen666 in #1146
[NPU]:Add support for the cross_entropy operator. by @TianHao324 in #1148
[Optimize, NPU] Remove tl.where from _rms_norm_forward/backward_kernel_tiled() by @zheliuyu in #1153
Add patches for Qwen3_5ForConditionalGeneration to support multimodal. by @wyt2000 in #1150
[NPU] Add NPU optimized sparsemax by @lowdy1 in #1104
fix: guard save_for_backward on grad_bias not bias in fused linear CE forward by @aryanputta in #1157
[NPU]support fused neighborhood attention for npu by @Hailey-Zh in #1034
Claude skill for Triton kernel development (liger-kernel-dev) by @vaibhavjindal in #1170
[Feature] Add Triton kernel for ReLU Squared activation by @vaibhavjindal in #1171
Claude skill for automatically monkey patching HF transformers models (liger-autopatch) by @vaibhavjindal in #1167
Add Liger Kernel support for nemotron models by @vaibhavjindal in #1165
Add Liger Kernel support for Ministral models by @vaibhavjindal in #1166
[bugfix] Remove the pos_ids parameter from apply_rotary_pos_emb by @zheliuyu in #1162
Add Kimi AttentionResiduals (AttnRes) kernelFeature/add attn res kernel by @kirsten-1 in #1161
[Benchmark]: Add --sweep-mode and --bt to benchmark CLI. by @noemotiovon in #1163
[NPU]: optimize cross entropy kernel gradient computation by @noemotiovon in #1178
Improve liger-autopatch skill to enable modifying existing monkey-patches by @vaibhavjindal in #1177
[Feature] Use Liger's Relu_Squared kernel for Nemotron models by @vaibhavjindal in #1176
[NPU] Add fused_linear_cross_entropy operator by @lowdy1 in #1164
[Skill] Add liger-kernel-perf skill for Triton kernel performance optimization by @vaibhavjindal in #1185
Improve bwd kernel for fused_add_rms_norm by @vaibhavjindal in #1187
optimize group_norm for ASCEND_NPU by @sunyi0505 in #1154
MoE Triton Kernel by @Mecoli1219 in #1179
[NPU]Improvement performence for grpo_loss by @UserChen666 in #1174
[fix]Add logits_to_keep and shift_labels support for Qwen3-VL and Qwen3-VL-MoE by @luca-888 in https://gith...

Contributors

kashif, albertvillanova, and 22 other contributors

Assets 2

12 Feb 22:00

vaibhavjindal

v0.7.0

7644a0f

v0.7.0

🚀 Liger-Kernel Now Fully Supports Transformers v5

We’ve added full support for Transformers v5!
🔗 #994

Liger now supports all 🤗 Transformers versions ≥ 4.52.0, including the latest v5 release.

Broader compatibility. Seamless upgrades. No version headaches.

Thanks to all the contributors!

What's Changed

Add CISPO loss type support for LigerFusedLinearGRPOLoss by @yukiu00 in #1054
Update checkstyle and fix the format issue by @xuedinge233 in #1071
Add SAPO loss type support for LigerFusedLinearGRPOLoss by @yukiu00 in #1073
[NPU]: Adaptive modification of NPU by @TianHao324 in #1055
[NPU] Frequencies fusion for Llama4_rope on NPU by @lowdy1 in #1053
Add CISPO and SAPO loss type support for Triton GRPO loss kernel by @yukiu00 in #1074
Add vLLM importance sampling ratio support for GRPO loss by @yukiu00 in #1088
Relaxing logp relative tolerances for mini-llama4 to fix flaky test. by @kolehma8 in #1089
Moving unit testing to a merge queue. by @kolehma8 in #1069
Fixing unit tests. by @kolehma8 in #1092
Merge queue test. by @kolehma8 in #1095
Changes to the nvidia tests in merge queue. by @kolehma8 in #1097
Support transfomers v5 by @Tcc0403 in #994
Remove latest v4 test to reduce cost by @Mecoli1219 in #1098
[Model] Pixtral Support by @AndreSlavescu in #253
[NPU]: NPU-optimized rms_norm kernel by @TianHao324 in #1099
[NPU]: NPU-optimized fused_add_rms_norm kernel by @TianHao324 in #1070
[NPU]: add support for grpo loss by @TianHao324 in #1049
Update pyproject.toml for v0.7.0 release by @vaibhavjindal in #1102

New Contributors

@xuedinge233 made their first contribution in #1071

Full Changelog: v0.6.5...v0.7.0

Contributors

kolehma8, vaibhavjindal, and 7 other contributors

Assets 2

04 Feb 02:01

vaibhavjindal

v0.6.5

81f932a

v0.6.5

What's Changed

fixbug, ensure FP32 accumulation for dW in Llama-mode RMSNorm backward by @niyunsheng in #950
Add Ascend NPU device support. by @Ginray in #955
define shift_labels in gemma by @akoumpa in #961
[feat]: Add support for gpt-oss by @yeshsurya in #949
Update README.md by @PKUWZP in #970
Fix qwen3vl apply_rotary_pos_emb_vision by @Tcc0403 in #967
[refactor] decoupling ops implementations for different vendors by @pillumina in #973
Fix: fix ignore_index not being applied in JSD distillation loss by @roycho96 in #974
ci: skip some rms_norm test cases for npu and bump torch-npu to 2.7.1 by @ji-huazhong in #977
Fix missing property access for multimodal models by @albertvillanova in #966
Bug fix for missing distillation loss arguments. by @kolehma8 in #983
Add AutoLigerKernelForCausalLM.from_config by @Tcc0403 in #962
Fix geglu by @konstantinos-p in #986
Update discord channel link and announcement for meetup by @momochen in #984
[NPU]: Adjust MAX_FUSED_SIZE when using fused_linear_cross_entropy by @zheliuyu in #985
[RMSNorm] Fix JIT recompilation by removing tl.constexpr on rows_per_program & Cleanup Block kernel interface by @niyunsheng in #988
[Feature] Add elementwise_affine argument to LigerRMSNorm by @niyunsheng in #989
[Fix] Handle missing 'elementwise_affine' in RMSNorm extra_repr for patched models by @niyunsheng in #990
[Fix] Replace conditional flow with tl.where in liger_cross_entropy_kernel for Triton 3.2 compatibility by @niyunsheng in #991
feat(NPU): add UB Manager for auto tiling strategy management by @noemotiovon in #987
[NPU]: Add NPU support for the mrope operator by @TianHao324 in #992
Changing RMS layer norm to accept DTensors. by @kolehma8 in #982
[NPU]: Add NPU support for the swiglu by @jiaqiw09 in #995
fix: prevent command injection in benchmark workflow by @arde171 in #997
[Model] Exaone4 Support by @roycho96 in #980
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #981
[NPU]: avoid pointer mutation in rms_norm kernel by @TianHao324 in #1000
[Fix]: avoid pointer mutation in group norm kernel by @noemotiovon in #999
[NPU]: support kl div on NPU by @noemotiovon in #1001
fix: replace HybridCache with Cache in gemma2 and gemma3 by @qgallouedec in #1002
[NPU]: refine geglu memory_multiplier based on UB analysis by @noemotiovon in #996
[NPU]: Add NPU support for the tvd operator by @TianHao324 in #998
[Test]: Add test suite for Llama4 RoPE implementation by @noemotiovon in #1004
feat: Align TiledMLP with DeepSpeed/ALST/Axolotl for PEFT compatibility by @akshatvishu in #1005
[NPU]: adjust MAX_FUSED_SIZE for NPU devices in group_norm by @noemotiovon in #1003
avoid pointer mutation in add_rms_norm kernel by @TianHao324 in #1008
pass param down to LigerFusedLinearCrossEntropyLoss by @kaixuanliu in #1010
avoid pointer mutation in layer_norm kernel by @TianHao324 in #1006
XPU: Enable new grf_mode settings by @Egor-Krivov in #1016
gemma3 consider loss_kwargs by @jp1924 in #1007
[Refactor]: optimize poly_norm backward kernel pointer handling by @noemotiovon in #1018
Enable benchmark_tvd.py for xpu devices by @Egor-Krivov in #1024
Add pre-commit config by @Tcc0403 in #1009
[NPU]: update the native KLDivLoss implementation for comparison. (eg.)test_jsd.py by @kiritorl in #1032
doc: reformat contributing.md for better visualization on github by @Tcc0403 in #1033
[NPU]: Add NPU support for the embedding by @TianHao324 in #1028
[NPU]: use get_soc_spec for UB capacity detection by @noemotiovon in #1038
[NPU]: optimize GEGLU implementation with flatten 1D approach by @noemotiovon in #1031
[NPU]: optimize tvd implementation by @TianHao324 in #1039
Workaround for OOM error on benchmark_jsd by @Egor-Krivov in #1037
Fix benchmark_qwn2vl_mrope.py by enabling new transformers API for Qwen2VLRotaryEmbedding by @Egor-Krivov in #1026
[NPU] FIX CE ub overflow on NPU by @lowdy1 in #1040
Avoid OOM error for benchmark_tvd.py on GPUs with less than 66GB of memory by @Egor-Krivov in #1042
[NPU]: optimize rope and mrope implementation by @TianHao324 in #1041
[NPU] Add Llama4_rope support on NPU by @lowdy1 in #1035
Fix memory requirements for benchmark_jsd.py and benchmark_distill_jsd_loss.py by @Egor-Krivov in #1050
Set transformers upper bound by @Tcc0403 in #1046
Unify NPU vector core count helpers by @lowdy1 in #1052
New version release v0.6.5 by @vaibhavjindal in #1063

New Contributors

@Ginray made their first contribution in #955
@akoumpa made their first contribution in #961
@pillumina made their first contribution in #973
@roycho96 made their first contribution in #974
@ji-huazhong made their first contribution in #977
@albertvillanova made their first contribution in #966
@kolehma8 made their first contribution in #983
@konstantinos-p made their first contribution in #986
@zheliuyu made their first contribution in #985
@noemotiovon made their first contribution in #987
@TianHao324 made their first contribution in #992
@jiaqiw09 made their first contribution in #995
@arde171 made their first contribution in #997
@salmanmkc made their first contribution in #981
@qgallouedec made their first contribution in #1002
@akshatvishu made their first contribution in #1005
@kaixuanliu made their first contribution in #1010
@kiritorl made their first contribution in #1032
@lowdy1 made their first contribution in #1040

Full Changelog: v0.6.4...v0.6.5

Contributors

momochen, Egor-Krivov, and 25 other contributors

Assets 2

21 Nov 22:48

momochen

v0.6.4

0a62700

v0.6.4 release

Highlights

New model architecture:
Qwen3-VL, hunyuanv1, Olmo3

New algorithm:
DAPO loss

Optimizations:
Layernorm backward, Tiled MLP

What's Changed

Option to return hard and soft loss when using distillation by @h-aurelien-lac in #895
Fix CE patch and add layernorm support for InternVL by @MilkClouds in #921
fix(ci): modify Glm4vMoe config for convergence test by @Tcc0403 in #918
Support for Qwen3-VL models by @mayankagarwals in #911
style: fix main branch format by @Tcc0403 in #929
fix: initialize grad_weight and grad_bias on flce no_grad path by @keatonelvins in #931
Fix qwen3 related tests by @vaibhavjindal in #933
[Cross-entropy-loss] return mean token accuracy metric with CE loss by @kashif in #910
Handle aux_loss for different transformer versions by @vaibhavjindal in #934
Add TiledMLP Implementation by @upskyy in #935
[Qwen3]: If qwen3 is used along with peft config, peft adds opcl obj no… by @yeshsurya in #926
Increase time limit for modal tests by @vaibhavjindal in #947
add hunyuanv1 dense and moe model by @Kingsleyandher in #940
Olmo3 model support [ready for review] by @tyler-romero in #946
[GRPO] add support for dapo loss by @kashif in #939
[Perf] Optimize LayerNorm Backward: Replace Atomics with Persistent Reduction by @niyunsheng in #945

New Contributors

@h-aurelien-lac made their first contribution in #895
@mayankagarwals made their first contribution in #911
@keatonelvins made their first contribution in #931
@upskyy made their first contribution in #935
@yeshsurya made their first contribution in #926
@Kingsleyandher made their first contribution in #940
@niyunsheng made their first contribution in #945

Full Changelog: v0.6.3...v0.6.4

Contributors

kashif, yeshsurya, and 10 other contributors

Assets 2

27 Oct 18:30

momochen

v0.6.3

d5648bf

v0.6.3 release

Highlights in this release:

New model architecture supports:
SmolVLM2, GLM4.5V, InternVL3, Falcon-H1, Qwen-Next

New algorithm:
GSPO

What's Changed

[cross-entropy-loss] Added support for DFT flag by @kashif in #860
fix(test): update assertions in GLM4 instance patching tests by @vvvdwbvvv in #859
Fix nan loss error for LigerFusedLinearJSDLoss by @ParagEkbote in #862
[Cross-entropy] get valid predicted probabilities by @kashif in #864
Enhance Docs by @ParagEkbote in #867
Add Classifiers for Liger-Kernel by @ParagEkbote in #869
docs(mta): suppress invalid sequence syntax warning by @Tcc0403 in #870
Add GSPO by @BjarniHaukur in #845
Add GLM4.5V support by @vvvdwbvvv in #863
A Fix for Issue #872 by @yshenaw in #879
Add pytest coverage for liger-kernel by @ParagEkbote in #876
Replace all torch_dtype with dtype by @Tcc0403 in #881
Update Dev Dependencies by @ParagEkbote in #886
Fixed AMD CI issue #793 by @DevManpreet5 in #887
fix(layernorm): remove n_cols upcasting for for torch.compile by @Tcc0403 in #884
Fix tests and CI by @vaibhavjindal in #882
Remove daily test cron job by @vaibhavjindal in #890
[UT] [XPU] Modify the test cases of XPU for triton3.5 by @YangKai0616 in #889
Add InternVL3 support by @MilkClouds in #878
fix(flce): add shift_labels as eval mode loss condition by @Tcc0403 in #888
Add support of Falcon-H1 models for liger kernels by @puneeshkhanna in #900
Don't deploy mkdocs to fix benchmarking by @vaibhavjindal in #904
Disable mllama multimodal test in transformers<4.51.0 by @Tcc0403 in #899
Add flce forward for FalconH1ForCausalLM and missing tests by @Tcc0403 in #903
feat(ce,flce): decouple gradients computation for no_grad mode by @Tcc0403 in #894
fix(llama4): Get correct swiglu patch target for llama4 moe layer by @alenawang in #907
Add PolyNorm operator by @0xtoward in #901
Copy and paste benchmarks before and after gh-pages deployment by @vaibhavjindal in #909
Filter out redundant ops/allocations in no_grad mode by @Tcc0403 in #906
Add support for Qwen3Next model with Liger kernels by @vvvdwbvvv in #912
refactor(convergence-test): remove unnecessary print by @Tcc0403 in #913
Enabled the tests glm4v/glm4v_moe for XPU and Fixed the monkey patch error by @YangKai0616 in #914
[Test][XPU] Added gpu cache cleaning for XPU devices by @Egor-Krivov in #917
Add SmolVLM2 support by @MilkClouds in #919

New Contributors

@BjarniHaukur made their first contribution in #845
@yshenaw made their first contribution in #879
@DevManpreet5 made their first contribution in #887
@MilkClouds made their first contribution in #878
@puneeshkhanna made their first contribution in #900
@alenawang made their first contribution in #907
@0xtoward made their first contribution in #901

Full Changelog: v0.6.2...v0.6.3

Contributors

kashif, Egor-Krivov, and 12 other contributors

Assets 2

22 Aug 00:15

vaibhavjindal

v0.6.2

77a4c1a

v0.6.2

What's Changed

Automate Benchmarking - fixing issue. by @Manan17 in #836
Make path variable global by @Manan17 in #840
Adding support for apo losses, sppo_hard and nca_pair by @Manan17 in #841
Add accum_dtype option for FusedLinearCrossEntropy by @Tcc0403 in #830
CI tests fix by @Manan17 in #847
docs(README): fix intel ci link by @Tcc0403 in #842
Llama4 rope implementation by @Manan17 in #843
fix(phi3): update monkey patch for Phi3ForCausalLM by @Tcc0403 in #837
feat(FLCE): expose accum_dtype for hf model monkey patch by @Tcc0403 in #851
Fix ci by @Manan17 in #853
Fix missing low-level api imports by @Kirill-Kravtsov in #856
Add glm4.1v model support by @vvvdwbvvv in #858
Update pyproject.toml version to 0.6.2 by @vaibhavjindal in #861

New Contributors

@Kirill-Kravtsov made their first contribution in #856

Full Changelog: v0.6.1...v0.6.2

Contributors

Kirill-Kravtsov, vaibhavjindal, and 3 other contributors

Assets 2

28 Jul 18:36

shimizust

v0.6.1

7705dcc

v0.6.1

What's Changed

Fix gemma3 forward with skip_logits by @BitPhinix in #795
Update README.md by @PKUWZP in #808
Fix minor typo by @hugoabonizio in #809
Update README.md by @PKUWZP in #811
Fix embedding benchmarks for backward pass by @Manan17 in #799
Giving an option to update benchmark results for previous commits. by @Manan17 in #791
[Model] Liger support for SmolLM3 by @edbeeching in #798
FusedAddRMSNorm: Fused residual addition and RMS Norm by @vaibhavjindal in #812
Skip smollm3 tests in tests-bwd by @vaibhavjindal in #821
Layernorm enhancement by @Manan17 in #815
Update README.md by @PKUWZP in #823
Update index.md by @PKUWZP in #824
Remove smollm3 import at top of file by @vaibhavjindal in #825
Fix illegal memory access in Triton RMSNorm kernel by casting program_id to int64 by @vvvdwbvvv in #804
fix(benchmark): move chunked loss module init out of measurements by @Tcc0403 in #643
[XPU]Fixed the issue with multiple num_warps parameters being passed in. by @YangKai0616 in #831
Automate benchmarking - for every release by @Manan17 in #828
Revert "Bug Fix: name patching for modules" by @vaibhavjindal in #833
Bug fixes in patching module by @vaibhavjindal in #834
docs(README): fix gpumode discord badge by @Tcc0403 in #835
Update pyproject.toml version to 0.6.1 by @shimizust in #838

New Contributors

@BitPhinix made their first contribution in #795
@PKUWZP made their first contribution in #808
@hugoabonizio made their first contribution in #809
@edbeeching made their first contribution in #798

Full Changelog: v0.6.0...v0.6.1

Contributors

hugoabonizio, edbeeching, and 8 other contributors

Assets 2

09 Jul 05:05

shimizust

v0.6.0

66570b1

v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates

Highlights

This release introduces significant improvements to Liger-Kernel, including new operators, support for Llama 4 models, more robust benchmarking automation, and key fixes for patching of vision-language models (VLMs) due to recent transformers refactoring.

Key Changes

New Features & Improvements

Multi-Token Attention by @AndreSlavescu (#689)
Fused Neighborhood Attention by @AndreSlavescu (#732)
Cosine Similarity Loss for Distillation by @Dexterai (#780)
Support for Llama 4 by @Manan17 (#740)
Option to choose fused LCE/CE loss by @connermanuel (#704)
Add block_rms_norm for QK norm by @mdy666 (#731)

Bug Fixes

Vision-language model patching in recent transformers versions (>=4.52.0):
- Qwen2vl, Qwen2_5_vl by @Tcc0403 (#728, #738)
- Llava by @Tcc0403, @Manan17 (#714, #743, #751)
- Gemma3 by @shimizust, @Tcc0403 (#735, #787, #790);
RMS Norm patching by @vaibhavjindal, @BenasdTW (#741, #765)
Hugging Face forward kwargs fix by @llllvvuu (#708)
Fix import tanh by @jue-jue-zi (#762)
Apply monkey patch to instances by @YangKai0616 (#772)

Documentation & CI Fixes

Deploy MkDocs to GitHub Pages by @ParagEkbote (#724)
Robust doc updates by @ParagEkbote (#726, #727)
.idea ignored by @Tcc0403 (#784)
ReadMe, MTA + softmax docs by @AndreSlavescu (#730)
Relax DyT tol, XPU skip MTA by @Tcc0403 (#778)
Paligemma test fixes by @vvvdwbvvv (#785)
Style & test fixes by @Tcc0403, @vaibhavjindal (#736, #794)
Add torchvision for multimodal test by @Tcc0403 (#755)

Benchmarking & Automation

Automated benchmarking and visualization UI in GitHub pages by @Manan17 (#744, #747, #749, #752, #753, #756, #759, #760, #770, #779)

New Contributors

@connermanuel made their first contribution in #704
@llllvvuu made their first contribution in #708
@jue-jue-zi made their first contribution in #762
@YangKai0616 made their first contribution in #772
@Dexterai made their first contribution in #780
@vvvdwbvvv made their first contribution in #785

Full Changelog: v0.5.10...v0.6.0

Contributors

llllvvuu, jue-jue-zi, and 12 other contributors

Assets 2

22 May 17:52

shimizust

v0.5.10

44a8f2f

v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes

What's Changed

fix zip bug by @KareemMusleh in #702
[dpo] set default average_log_prob to False by @cyr0930 in #693
Rank build status lower by @momochen in #707
Add support for Qwen3 MoE models by @chiwanpark in #706
Fix qwen3_moe flaky convergence test by @vaibhavjindal in #710
Fix empty Medusa head tensors by @chiwanpark in #698
Sparsemax by @AndreSlavescu in #687
fix: remove docstring imports in transformer patches by @NanoCode012 in #712
Increase tests timeout to 45 mins by @vaibhavjindal in #718
fix modal tests by @shivam15s in #719
Visualizer Update by @AndreSlavescu in #717
Sparsemax Documentation by @AndreSlavescu in #716
element-wise-DyT faster than the origin LigerDyT by @mdy666 in #673
GRPO Loss kernel fully write by triton, reduce 46G memory by @mdy666 in #672
Make FLCE compatible with FSDP and PEFT by @astefanutti in #674
Fix incorrect module patching when using LoRA with modules_to_save by @BenasdTW in #632
[XPU] Changed how XPU discovery works during setup.py by @Egor-Krivov in #720
Fix to publish docs on pushes to main branch by @shimizust in #722
Release 0.5.10 by @shimizust in #725

New Contributors

@KareemMusleh made their first contribution in #702
@cyr0930 made their first contribution in #693
@NanoCode012 made their first contribution in #712
@mdy666 made their first contribution in #673
@astefanutti made their first contribution in #674
@Egor-Krivov made their first contribution in #720

Full Changelog: v0.5.9...v0.5.10

Contributors

astefanutti, momochen, and 11 other contributors

Assets 2

04 May 19:47

shivam15s

v0.5.9

f19068f

v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes

What's Changed

update setup.py for installation on xpu by @faaany in #668
update XPU CI yaml file to use docker container by @faaany in #669
Add average_log_prob as an init param for LigerFusedLinearDPOLoss by @vaibhavjindal in #676
add shift label change by @shivam15s in #683
remove tests that can pass on XPU by @faaany in #686
Update mkdocs.yml by @shivam15s in #691
Fix LigerCrossEntropy reduction='none' by @Tcc0403 in #680
Support GLM-4 models by @intervitens in #685
Import glm4_lce_forward locally in function by @vaibhavjindal in #695
Qwen3 model support by @vaibhavjindal in #692
Use logits_to_keep logic for training runs by @vaibhavjindal in #696
increase gemma3 multimodal convergence test loss atol by @shivam15s in #697
Update pyproject.toml by @shivam15s in #700

New Contributors

@intervitens made their first contribution in #685

Full Changelog: v0.5.8...v0.5.9

Contributors

faaany, vaibhavjindal, and 3 other contributors

Assets 2

Releases: linkedin/Liger-Kernel

v0.8.0

Liger-Kernel v0.8.0

Highlights ✨

🚀 MoE Training Acceleration via LigerExperts

🤖 Claude Code Skills for Kernel Authoring (.claude/skills/)

🧩 New Model Support

Ascend NPU Backend Support

What's Changed

Contributors

Uh oh!

v0.7.0

🚀 Liger-Kernel Now Fully Supports Transformers v5

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.5

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.4 release

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.3 release

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates

Highlights

Key Changes

New Features & Improvements

Bug Fixes

Documentation & CI Fixes

Benchmarking & Automation

New Contributors

Contributors

Uh oh!

v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes

What's Changed

New Contributors

Contributors

Uh oh!

🚀 MoE Training Acceleration via `LigerExperts`

🤖 Claude Code Skills for Kernel Authoring (`.claude/skills/`)