Skip to content

fix: implement Mxfp4Dequantize.reverse_op for save_pretrained support#44983

Open
Hyungkeun-Park-Nota wants to merge 1 commit intohuggingface:mainfrom
Hyungkeun-Park-Nota:fix/mxfp4-dequantize-reverse-op
Open

fix: implement Mxfp4Dequantize.reverse_op for save_pretrained support#44983
Hyungkeun-Park-Nota wants to merge 1 commit intohuggingface:mainfrom
Hyungkeun-Park-Nota:fix/mxfp4-dequantize-reverse-op

Conversation

@Hyungkeun-Park-Nota
Copy link

@Hyungkeun-Park-Nota Hyungkeun-Park-Nota commented Mar 25, 2026

What does this PR do?

Fixes save_pretrained() failure for GPT-OSS models loaded with Mxfp4Config(dequantize=True).

When Triton/kernels are unavailable, transformers automatically falls back to dequantize=True, converting MXFP4 weights to bf16. However, save_pretrained() then fails because Mxfp4Dequantize.reverse_op raises NotImplementedError.

Since dequantized models are regular bf16 models, the correct behavior is to save them as-is rather than re-quantize to MXFP4.

Changes

src/transformers/integrations/mxfp4.py:

  • Add Mxfp4IdentityOp as Mxfp4Dequantize.reverse_op — passes through bf16 weights unchanged during save

src/transformers/quantizers/quantizer_mxfp4.py:

  • Remove quantization_config from model config in _process_model_after_weight_loading when dequantize=True, so the saved config.json does not contain quant_method: mxfp4. Without this, reloading the saved bf16 model would attempt the MXFP4 loading path and fail because _blocks/_scales keys don't exist.

Reproduction

from transformers import AutoModelForCausalLM, Mxfp4Config

# Triton unavailable → auto dequantize fallback
model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    quantization_config=Mxfp4Config(dequantize=True),
)
model.save_pretrained("/tmp/test")  # NotImplementedError before this fix
transformers/core_model_loading.py:629 in <listcomp>
    kwargs["operations"] = [op.reverse_op for op in self.operations[::-1]]
transformers/core_model_loading.py:101 in reverse_op
    raise NotImplementedError

@Hyungkeun-Park-Nota Hyungkeun-Park-Nota force-pushed the fix/mxfp4-dequantize-reverse-op branch from 1742755 to 9c59dda Compare March 25, 2026 01:24
@Rocketknight1
Copy link
Member

cc @SunMarc for quantization maybe?

@SunMarc
Copy link
Member

SunMarc commented Mar 25, 2026

model = AutoModelForCausalLM.from_pretrained(
"openai/gpt-oss-20b",
quantization_config=Mxfp4Config(dequantize=True),
)
model.save_pretrained("/tmp/test") # NotImplementedError before this fix

hmmm, this shouldn't trigger a reverse ops when we dequantized the model. I think the right behavior here would be to just save the model in its dequantized form.

@Hyungkeun-Park-Nota Hyungkeun-Park-Nota force-pushed the fix/mxfp4-dequantize-reverse-op branch from 9c59dda to b676da0 Compare March 26, 2026 01:34
@Hyungkeun-Park-Nota
Copy link
Author

Hyungkeun-Park-Nota commented Mar 26, 2026

@SunMarc Thanks for the review! Updated the PR based on your feedback:

  1. Removed re-quantization logic — replaced Mxfp4ReverseDequantize with Mxfp4IdentityOp that simply passes through bf16 weights as-is during save
  2. Remove quantization_config after dequantize — in _process_model_after_weight_loading, when dequantize=True, we delete model.config.quantization_config so the saved model loads as a regular bf16 model without triggering the MXFP4 loading path

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

When a GPT-OSS model is loaded with Mxfp4Config(dequantize=True),
save_pretrained() fails with NotImplementedError because
Mxfp4Dequantize.reverse_op is not implemented.

Since dequantized models are regular bf16 models, the correct behavior
is to save them as-is rather than re-quantize to MXFP4:

- Add Mxfp4IdentityOp as Mxfp4Dequantize.reverse_op to pass through
  bf16 weights unchanged during save
- Remove quantization_config from model config after dequantize so the
  saved model loads as a regular bf16 model without triggering MXFP4
  loading path
@Hyungkeun-Park-Nota Hyungkeun-Park-Nota force-pushed the fix/mxfp4-dequantize-reverse-op branch from cd2c8fe to 13f9355 Compare March 26, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants