fix: implement Mxfp4Dequantize.reverse_op for save_pretrained support by Hyungkeun-Park-Nota · Pull Request #44983 · huggingface/transformers

Hyungkeun-Park-Nota · 2026-03-25T01:19:59Z

What does this PR do?

Fixes save_pretrained() failure for GPT-OSS models loaded with Mxfp4Config(dequantize=True).

When Triton/kernels are unavailable, transformers automatically falls back to dequantize=True, converting MXFP4 weights to bf16. However, save_pretrained() then fails because Mxfp4Dequantize.reverse_op raises NotImplementedError.

Since dequantized models are regular bf16 models, the correct behavior is to save them as-is rather than re-quantize to MXFP4.

Changes

src/transformers/integrations/mxfp4.py:

Add Mxfp4IdentityOp as Mxfp4Dequantize.reverse_op — passes through bf16 weights unchanged during save

src/transformers/quantizers/quantizer_mxfp4.py:

Remove quantization_config from model config in _process_model_after_weight_loading when dequantize=True, so the saved config.json does not contain quant_method: mxfp4. Without this, reloading the saved bf16 model would attempt the MXFP4 loading path and fail because _blocks/_scales keys don't exist.

Reproduction

from transformers import AutoModelForCausalLM, Mxfp4Config

# Triton unavailable → auto dequantize fallback
model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    quantization_config=Mxfp4Config(dequantize=True),
)
model.save_pretrained("/tmp/test")  # NotImplementedError before this fix

transformers/core_model_loading.py:629 in <listcomp>
    kwargs["operations"] = [op.reverse_op for op in self.operations[::-1]]
transformers/core_model_loading.py:101 in reverse_op
    raise NotImplementedError

Rocketknight1 · 2026-03-25T12:35:04Z

cc @SunMarc for quantization maybe?

SunMarc · 2026-03-25T14:09:21Z

model = AutoModelForCausalLM.from_pretrained(
"openai/gpt-oss-20b",
quantization_config=Mxfp4Config(dequantize=True),
)
model.save_pretrained("/tmp/test") # NotImplementedError before this fix

hmmm, this shouldn't trigger a reverse ops when we dequantized the model. I think the right behavior here would be to just save the model in its dequantized form.

Hyungkeun-Park-Nota · 2026-03-26T01:36:39Z

@SunMarc Thanks for the review! Updated the PR based on your feedback:

Removed re-quantization logic — replaced Mxfp4ReverseDequantize with Mxfp4IdentityOp that simply passes through bf16 weights as-is during save
Remove quantization_config after dequantize — in _process_model_after_weight_loading, when dequantize=True, we delete model.config.quantization_config so the saved model loads as a regular bf16 model without triggering the MXFP4 loading path

github-actions · 2026-03-26T02:08:30Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

When a GPT-OSS model is loaded with Mxfp4Config(dequantize=True), save_pretrained() fails with NotImplementedError because Mxfp4Dequantize.reverse_op is not implemented. Since dequantized models are regular bf16 models, the correct behavior is to save them as-is rather than re-quantize to MXFP4: - Add Mxfp4IdentityOp as Mxfp4Dequantize.reverse_op to pass through bf16 weights unchanged during save - Remove quantization_config from model config after dequantize so the saved model loads as a regular bf16 model without triggering MXFP4 loading path

Hyungkeun-Park-Nota force-pushed the fix/mxfp4-dequantize-reverse-op branch from 1742755 to 9c59dda Compare March 25, 2026 01:24

Hyungkeun-Park-Nota force-pushed the fix/mxfp4-dequantize-reverse-op branch from 9c59dda to b676da0 Compare March 26, 2026 01:34

Hyungkeun-Park-Nota force-pushed the fix/mxfp4-dequantize-reverse-op branch from cd2c8fe to 13f9355 Compare March 26, 2026 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: implement Mxfp4Dequantize.reverse_op for save_pretrained support#44983

fix: implement Mxfp4Dequantize.reverse_op for save_pretrained support#44983
Hyungkeun-Park-Nota wants to merge 1 commit intohuggingface:mainfrom
Hyungkeun-Park-Nota:fix/mxfp4-dequantize-reverse-op

Hyungkeun-Park-Nota commented Mar 25, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Mar 25, 2026

Uh oh!

SunMarc commented Mar 25, 2026

Uh oh!

Hyungkeun-Park-Nota commented Mar 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Hyungkeun-Park-Nota commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Reproduction

Uh oh!

Rocketknight1 commented Mar 25, 2026

Uh oh!

SunMarc commented Mar 25, 2026

Uh oh!

Hyungkeun-Park-Nota commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hyungkeun-Park-Nota commented Mar 25, 2026 •

edited

Loading

Hyungkeun-Park-Nota commented Mar 26, 2026 •

edited

Loading